Methods and compositions for synthetic rna endonucleases

ABSTRACT

The present invention provides sequence specific restriction enzymes for site-specific cleavage of RNA, as well as methods of their use.

STATEMENT OF PRIORITY

This application is a divisional application of, and claims priority to,U.S. application Ser. No. 13/805,240, filed Jan. 31, 2013, which is a 35U.S.C. §371 national phase application of Application Serial No.PCT/US2011/040933, filed Jun. 17, 2011, which claims the benefit, under35 U.S.C. §119(e), of U.S. Provisional Application Ser. No. 61/356,340,filed Jun. 18, 2010, the entire contents of each of which areincorporated herein by reference.

STATEMENT REGARDING ELECTRONIC FILING OF A SEQUENCE LISTING

A Sequence Listing in ASCII text format, submitted under 37 C.F.R.§1.821, entitled 5470-561v2_ST25.txt, 139,617 bytes in size, generatedon Jun. 29, 2016 and filed via EFS-Web, is provided in lieu of a papercopy. This Sequence Listing is hereby incorporated by reference into thespecification for its disclosures.

FIELD OF THE INVENTION

The present invention is directed to sequence specific restrictionenzymes for site-specific cleavage of RNA, as well as methods of theiruse.

BACKGROUND OF THE INVENTION

Ribonucleases play important roles in various pathways of nucleic acidmetabolism, including control of gene expression, mRNA surveillance anddegradation and host defense mechanism against RNA viruses (1-3). Sincethe first ribonuclease was discovered as a heat stable enzyme frompancreas capable of digesting yeast RNA, a diverse panel of RNases hasbeen characterized. However, unlike DNA restriction enzymes, a proteinenzyme that cleaves RNA in a sequence-specific manner has not been foundin nature. The known RNA endonucleases either specifically cleave theirtarget through recognition of certain structures (e.g., RNase IIIfamily, RNase H or most ribozymes) (4-6), or have essentially nosequence specificity (e.g., RNase A cleaves after pyrimidine residuesand RNase T1 cleaves after G residues) (7). The sequence specificcleavage of RNA can be achieved by a large multi-component complex suchas the spliceosome or the RISC complex in RNAi pathway, each of whichrequire guide RNA to recognize their targets (8, 9) and involve largeprotein/RNA assemblies, limiting their application in probing structuredRNA or manipulating recombinant RNA in vitro.

Sequence specific cleavage of RNA has been achieved using engineeredhammerhead ribozymes or RNA-cleaving DNAzymes (10). Both types of enzymerecognize their substrates through Watson-Crick binding arms of 6-12 nt,and therefore can achieve high target selectivity. However, thesenucleic acid enzymes generally have low turnover rate (with k_(cat)around 1 min⁻¹) compared to protein enzymes, possibly due to tightbinding to their substrates. In addition, the in vitro application ofsuch nucleic acid enzymes is compromised by the high production cost andlow stability of RNA, as well as the difficulty in controlling thefolding of single stranded RNA or DNA.

The present invention overcomes previous shortcoming in the art byproviding site specific RNA endonucleases and methods of their use.

SUMMARY OF THE INVENTION

In one aspect, the present invention provides a synthetic RNAendonuclease comprising the formula: A-B-C, wherein: A is an RNA bindingdomain, B is a linker peptide, and C is a cleavage domain.

In addition, the present invention provides a method of detecting atarget RNA in a sample, comprising: a) contacting the sample with theRNA endonuclease of this invention under conditions whereby cleavage ofRNA occurs if the target RNA is present in the sample and wherein theRNA binding domain of the RNA endonuclease is modified to bind thetarget RNA; and b) detecting a cleavage product of the target RNA,thereby detecting the target RNA in the sample.

Furthermore, the present invention provides a method of cleaving atarget mRNA in a sample, comprising contacting the sample with the RNAendonuclease of this invention under conditions whereby cleavage of thetarget mRNA occurs and wherein the RNA binding domain of the RNAendonuclease is modified to bind the target mRNA, thereby cleaving thetarget mRNA in the sample.

In yet further aspects of this invention, a method is provided ofcleaving a target mRNA in a cell, comprising introducing into the cellthe RNA endonuclease of this invention, wherein the RNA binding domainof the RNA endonuclease is modified to bind the target mRNA, underconditions whereby cleavage of the mRNA occurs, thereby cleaving thetarget mRNA in the cell.

Also provided herein is a method of inhibiting expression of a targetgene in a cell, comprising introducing into the cell the RNAendonuclease this invention, wherein the RNA binding domain of the RNAendonuclease is modified to bind mRNA encoding a gene product of thetarget gene, under conditions whereby cleavage of the mRNA occurs,thereby inhibiting expression of the target gene in the cell.

The present invention additionally provides a method of cleaving atarget mRNA in a mitochondrion in a cell, comprising introducing intothe cell the RNA endonuclease of this invention, wherein the RNA bindingdomain of the RNA endonuclease is modified to bind the target mRNA inthe mitochondrion and wherein the RNA endonuclease comprises amitochondrial targeting signal sequence, under conditions wherebycleavage of the target mRNA in the mitochondrion occurs, therebycleaving the target mRNA in the mitochondrion in the cell.

Further provided herein is a method of inhibiting expression of a targetmitochondrial gene in a cell, comprising introducing into the cell theRNA endonuclease of this invention, wherein the RNA binding domain ofthe RNA endonuclease is modified to bind mRNA encoding a gene product ofthe target mitochondrial gene and wherein the RNA endonuclease comprisesa mitochondrial targeting signal sequence, under conditions wherebycleavage of the target mRNA in the mitochondrion occurs, therebyinhibiting expression of the target mitochondrial gene in the cell.

Additional aspects of this invention include a method of treatingdystrophia myotonica (DM) in a subject, comprising administering to thesubject an effective amount of the RNA endonuclease of this invention,wherein the RNA binding domain of the RNA endonuclease is modified tobind mRNA encoding (CUG)n repeats in the 3′ UTR of DMPK to treat DM 1and/or mRNA encoding (CCUG)n repeats in intron 1 of ZNF9 to treat DM2and wherein the RNA endonuclease comprises a mitochondrial targetingsignal sequence, thereby treating DM in the subject.

The present invention also provides a method of detecting an RNA virusin a sample, comprising: a) contacting the sample with the RNAendonuclease of this invention under conditions whereby cleavage of RNAoccurs if RNA of the RNA virus is present in the sample and wherein theRNA binding domain of the RNA endonuclease is modified to bind a targetRNA of the RNA virus; and b) detecting a cleavage product of the targetRNA, thereby detecting the RNA virus in the sample.

A method is also provided herein of diagnosing a viral infection in asubject, comprising: a) contacting the sample from the subject with theRNA endonuclease of this invention under conditions whereby cleavage ofRNA occurs if viral RNA is present in the sample and wherein the RNAbinding domain of the RNA endonuclease is modified to bind viral RNA;and b) detecting a cleavage product of the target RNA, thereby detectingviral RNA in the sample and thereby diagnosing a viral infection in thesubject.

Furthermore, the present invention provides a method of identifying astrain of an RNA virus in a sample, comprising: a) contacting the samplewith the RNA endonuclease of this invention under conditions wherebycleavage of RNA occurs if the RNA of the strain of the RNA virus ispresent in the sample and wherein the RNA binding domain of the RNAendonuclease is modified to bind a target RNA specific to the strain ofthe RNA virus; and b) detecting a cleavage product of the target RNA,thereby identifying the strain of the RNA virus in the sample.

The foregoing and other aspects of the present invention will now bedescribed in more detail with respect to other embodiments describedherein. It should be appreciated that the invention can be embodied indifferent forms and should not be construed as limited to theembodiments set forth herein. Rather, these embodiments are provided sothat this disclosure will be thorough and complete, and will fullyconvey the scope of the invention to those skilled in the art.

DESCRIPTION OF THE FIGURES

FIG. 1. Design and activity validation of artificial sequence specificRNA endonucleases (ASREs). (Panel A) Structures of the PUF domain ofpumilio1 with NRE-19 RNA (1M8W) and the PIN domain of Smg6 (2HWW). ThePUF domain contains eight repeats, each recognizing a single RNA base.Three amino acids from each repeat interact with the Watson-Crick edgeof an RNA base and determine the binding specificity of the repeat (leftpanel). The PIN domain has an RNase H like active site with three Aspresidues and co-ordinates one divalent metal cation (right panel).(Panel B) ASRE with N-terminal PUF, C-terminal PIN and a heptapeptidelinker was incubated with a cognate RNA substrate for 30 min.Site-specific cleavage of RNA was obtained with the ASRE containing awild type PIN domain (lane 2) but not with ASRE containing a mutated PINdomain (lane 3). (Panel C) Inverted ASRE (PIN-PUF fusion protein from Nto C terminus) caused complete, non-specific cleavage of an RNAsubstrate. The ASRE in PUF-PIN orientation was included as control (lane1).

FIG. 2. Biochemical characterization of ASREs. (Panel A) The linkerlength affects the activity of ASRE. ASRE with a tripeptide linkershowed limited activity whereas efficient cleavage activity was achievedwith medium (heptapeptide) and long linkers (dodecapeptide). At longerreaction time, non-specific digestion products (indicated by asterisksin lane 5) were observed with ASRE containing a dodecapeptide linker.(Panel B) Digestion time course of 7u6g RNA by ASRE(6-2/7-2). Digestionwas followed in the standard reaction condition substituted with 3 mMMn²⁺. (Panel C) RNA substrates containing either an NRE site (UGUAUAUA)or 7u6g site (UugAUAUA) were incubated with ASRE(Wt) or ASRE(6-2/7-2).Lanes 1 and 4 are controls without enzyme. (Panel D) ASRE shows divalentmetal ion selectivity, with Mn²⁺ yielding highest activity and Mg²⁺ andCo²⁺ giving lower activity. The concentrations of divalent metal ionswere 3 mM in all lanes. (Panel E) Semilog plots were used to determinethe pseudo first-order reaction rates of ASRE catalyzed RNA cleavage inthe presence of 2.5 mM (♦), 5 mM (▪) and 7.5 mM (▴) Mn²⁺. Non-specificdigestion occurred with Mn²⁺ concentration greater than 7.5 mM (data notshown).

FIG. 3. Kinetic parameters and cleavage sites of ASREs. (Panel A) Plotof initial reaction rates vs. substrate concentration for ASRE(wt). Theenzyme was incubated with various concentration of cognate substrate for5 min and the initial reaction rates were measured as the amount of RNAcleaved per minute. Kinetics constants were determined by fitting thecurve to a Michaelis-Menton kinetic model. (Panel B) Plot of initialreaction rates vs. substrate concentration for ASRE(6-2/7-2). Reactionconditions and data analysis were the same as panel A. (Panel C) Thecleavage site was mapped by 5′ or 3′ DSS-RACE from gel purified RNAproducts. The positions of two cleavage sites are indicated with arrowsand the relative frequencies are plotted. The 8-nt binding sequence ofPUF is shown in bold. (Panel D) The “curve back” model of ASRE bestexplains the cleavage positions mapped in panel C.

FIG. 4. Using ASRE to silence gene expression in bacterial cells. (PanelA) The activity of β-galactosidase in E coli strains transformed withexpression plasmids of ASRE(lacZ) and control ASREs. Expression of ASREand endogenous lacZ gene was induced with IPTG. For each ASRE strain,lacZ expression for five independent clones (N=5) over three experiments(in triplicates) was measured to circumvent clonal variation. Thepercent of β-galactosidase activity was normalized to clones transformedwith the empty vector (N=2) and induced with the same condition. Theβ-galactosidase activity was also measured for uninduced clonescontaining empty vector as the baseline activity. Control ASREs includethe non-specific ASRE(87621) that targets a different sequence and themutated ASRE(LacZ) containing a D1353A mutation in the PIN domain activesite. (Panel B) The levels of lacZ mRNA were measured by real timeRT-PCR. The cell samples were the same as in panel A. LacZ mRNA levelswere normalized to ftsZ mRNA and the relative RNA abundances compared touninduced controls are plotted. In all figures, error bars represent thestandard error across all clones. The lacZ mRNA was partially degradedin D1353A mutated ASRE(LacZ), consistent with previous finding that thePUF with a single mutation at D1353 still had partial activity invivo¹⁷.

FIG. 5. Sequence alignment of human, mouse, rat Pumilio 1 (hpum1, Mpum1,Ratpum1, SEQ ID NOS:55-57) human and mouse Pumilio2 (hpum2, Mpum2, SEQID NOS:58 and 59) and Drosophila Pumilio (Dspum, SEQ ID NO:60).

FIG. 6. Sequence alignment and comparison of the Drosophila Pumilio(PumDr, SEQ ID NO:61) and human PUM1 (SEQ ID NO:62) and PUM2 (SEQ IDNO:63) proteins.

FIG. 7. Unique RNA binding mode of PUF domain. (Panel A) Crystalstructure of a PUF domain that binds to RNA. (Panel B) Schematic diagramof modular recognition code of each PUF repeat and the RNA base, The 8PUF repeats bind RNA in an anti-parallel fashion, with each repeatrecognizing a single base. Each base was stacked between two PUF repeatsby the Tyr, His or Arg residue, and the side chains of two amino acidresidues in each repeat determine which RNA base to recognize. The Uwill be recognized by NxxxQ (SEQ ID NO:64) residues in a PUF repeat, theG will be recognized by SxxxE (SEQ ID NO:65), and the A will berecognized by (C/S)xxxQ (SEQ ID NO:66). By mutating each repeat, a PUFdomain can be generated that specifically recognizes an 8-nt with highaffinity.

FIG. 8. Sequence alignment of the PIN domains of EST1A family proteinsfrom Homo sapiens (hu_pin, SEQ ID NO:67), Mus musculus (Ms_pin, SEQ IDNO:68), Drosophila melanogaster (Dm_pin, SEQ ID NO:69), andCaenorhabditis elegans (Ce_pin, SEQ ID NO:70) (upper alignment).

FIG. 9. Structure of the potential functional residues of the smg6 PINdomain and archael PIN-domain proteins.

FIG. 10. Structurally similar proteins.

FIG. 11. Structure of RNAse A-like fold.

FIG. 12. Structure of RNAse H-like fold.

FIG. 13. (Panel A) Expression and purification of ASRE proteins. Theproteins were purified from soluble bacterial fraction using Ni-NTAcolumn chromatography. The purification yields milligram levels ofhomogeneous preparation as shown in the figures. The approximatemolecular weight of the protein is 63 KDa. (Panel B) ASRE activity withvarying pH.

FIG. 14. Hi-fidelity cleavage of ASRE. ASRE(6-2/7-2) is used to test theactivity in cleaving different substrates including NRE (UGUAUAUA), 3g(UGUAUgUA), 5g3g1g (UGUgUgUg), 87621 (gugAUAag, also named as8g7u6g2a1g) and its cognate substrate, 7u6g in standard digestion assay.No digestion is obtained with non-specific substrates in the reactionconditions tested.

FIG. 15. Measurement of initial rate of ASRE catalyzed cleavage. (PanelA) A representative example of urea-PAGE gel used for determination ofkinetic parameters of ASRE(6-2/7-2). Different amounts of RNA substrateswere incubated with ASRE or buffer for only 5 mins, equal volumes ofdigested products and undigested controls in adjacent lanes of PAGE.(Panel B). The plot of RNA band intensity vs. the input RNA amount ofundigested control samples, suggesting the linear range ofquantification.

FIG. 16. Single stranded DNA does not interfere with ASRE activity.Addition of single stranded 60 nt DNA in the reaction mixture does notinhibit ASRE activity. RNA substrate (7u6g) is digested with ASRE in a10 min reaction in the presence of ss DNA (the molar ratio of RNA:DNA is1:1 in lanes-2 and 1:10 in lanes 3-4). Experimental controls (RNaseA andRNaseH) are also shown.

FIG. 17. Schematic diagram of DSS-RACE. Top panel shows digestion of RNAsubstrate by ASRE—the 5′ portion of the product is shown in grey and the3′ end in black. The cut site is either mapped by 3′RACE (Panel A) or5′RACE (Panel B). A number of clones (˜40) were sequenced fordetermination of cleavage sites.

FIG. 18. Sequence at the cleavage sites as mapped by DSS-RACE. (Panel A)Part of the sequence near the recognition site of GU/UG RNA substrateSEQ ID NO:86) is shown with the PUF recognition site marked in bold.(Panel B) The 3′ end sequence of 5′ cleavage product (SEQ ID NOS:87 and88). The major cut site is marked with an asterisk. (Panel C) The 5′ endsequence of the 3′ cleavage product (SEQ ID NOS:89 and 90). The majorcut site is marked with an asterisk. An extra “G” residue (boxes) wasusually added due to the terminal deoxynucleotide transferase activityof reverse transcriptase.

FIG. 19. Decrease in β-galactosidase protein level in ASRE(LacZ) treatedcells. The cell lysates from independent clones of bacteria expressingcontrol ASRE (lanes 1-3), D1353A mutated ASRE(LacZ) (lanes 4-6, andASRE(LacZ) were separated with SDS-PAGE by western blots with antibodyagainst β-galactosidase (top panel) or GroEL (middle panel). Theexpression level of ASREs was determined by Comassie staining in bottompanel.

FIG. 20. Proposed models for the non-specific cleavage of the PIN-PUFfusion protein. (Panel A) The ASRE with PUF-PIN orientation from N- toC-terminus. The PIN domain is probably curved back with the active sitefacing the PIN domain and RNA substrate. (Panel B) ProPanel posed modelof fusion protein in PIN-PUF orientation. The active site of the PINdomain may face away from PUF, therefore it can non-specifically cleaveRNA. (Panel C) An alternative proposed model of fusion protein inPIN-PUF orientation. The C-terminal residue/linker of the PUF domain isvery flexible, therefore allowing RNA to access the PIN active site andbe cleaved non-specifically.

FIG. 21. (Panel A) Diagram of human mitochondrial genome. The 13 proteincoding genes are shown and the tRNA genes are represented with a blackor grey bar. (Panel B) Diagram of three ASREs used in experiments.(Panel C) Subcellular localization of mitoASRE and control ASREs. HeLacells were transiently transfected with expression vectors and the ASREswere detected with anti-flag antibody. Mitotrackers were used to markthe mitochondria. (Panel D) The decrease of ND5 mRNA as judged by realtime RT-PCR. (Panel E) The change of ND5 protein level measured bywestern blots. Samples are assayed 24 hours after tetracycline (tet)induction.

FIG. 22. RNA pathogenesis model of DM1. (Panel A) The RNA repeats in the3′UTR of DMPK (SEQ ID NO:91) form a hairpin structure that sequestersMBNL1 and increases CUGBP1 level. (Panel B) Changes of balance forsplicing factor level cause many genes to shift to embryonic splicingpattern in DM1 adults, and such splicing shifts cause various symptomsof DM1.

FIG. 23. Engineering ASRE for specific RNA cleavage. The ASRE wasconstructed by fusing N-terminal PUF and C-terminal PIN domains with a 7aa linker. RNA substrates containing a PUF binding site (grey box) wereincubated with the purified ASRE at 37° C. Left panel: time course ofASRE catalyzed RNA cleavage. Right panel: two ASREs, each recognizing Wtor 7u6g targets that differ by 2-nt. ASREs specifically cleave theircognate substrate.

FIG. 24. Identification of PUF binding code for C residue. (Panel A)Diagram of yeast 3-hybrid (Y3H) screen to identify the C binding code ofthe PUF repeat. (Panel B) Measurement of the background of Y3H screenwith positive and negative controls. Two independent yeast clones wereassayed for β-gal activity, and the dark staining indicates theRNA-protein interaction. From top to bottom: no RNA control; a knownRNA-protein interaction pair; Wt PUF and Wt RNA; and Wt PUF and U3Cmutated RNA. (Panel C) Randomized library for residue 16 and 20 in PUFrepeat 6 (SEQ ID NO:92). (Panel D) The SxxxR (SEQ ID NO:143) combinationat PUF repeat 6 makes the PUF domain specifically recognize C at theposition 3. PUF-Eco is a control PUF domain with an EcoRI site(equivalent of two amino acids) inserted between S and R residues. AllRNA targets containing U, A, C and G in position 3 were tested for thespecific binding of PUF:RNA, which was determined as the LacZ activityin the Y3H system. Higher LacZ activity reflects strong binding betweenRNA and protein.

FIG. 25. Engineering PUF domains to specifically recognize (CUG)_(n)repeats. (Panel A) The (CUG)_(n) repeat (SEQ ID NO:144) can berecognized as 8-nt through three different frames, thus 3 possible PUFdomains can be engineered. (Panel B) Step-wise mutating each PUF repeatto generate PUF domains that recognize different targets. The top rowrepresents the starting sequences recognized by two modified PUFdomains. Using site directed mutagenesis on each repeat, the PUF bindingspecificity was changed to recognize different intermediates, and thebottom sequences are (CUG)_(n) frames 1 and 2 that were recognized bythe PUF-D and PUF-E. (Panel C) Measurement of RNA:Protein interactionusing Y3H system. The binding between wild type PUF and its cognate RNAwas used as positive control.

FIG. 26. Effects of linker sequence and structure on ASRE activity. TheASRE(6-2/7-2) containing different linker sequences (SEQ ID NOS:145-148) was incubated with its cognate substrate and the cleavage ofthe RNA was detected with urea-PAGE gel. The different linkers werechosen from the protein inter-domain linker database. Both the sequenceand the PDB code were shown. The secondary structures of the linkersare: 1a104_1 (SEQ ID NO:145): HHHHTTCCCCCCCHHHH; 1sesA_1 SEQ ID NO:146):HHHHHHHHHHH; 1qaxA_3 SEQ ID NO:147): HHHHTTTHHH; 1pfkA_2 SEQ ID NO:148):CGGGGGCSCC. The structure of inter-domain linkers was determined in thenatural context of respective proteins. The notation of secondarystructure follows the standard DSSP code (H, α helix; T, hydrogen bondedturn; C, coil; G, 3₁₀ helix; S, bend). See Table 2 for details.

FIG. 27. Identification of a cytosine-recognition code by yeastthree-hybrid screen. (Panel A) Schematic representation of theinteraction between wild-type PUF and its RNA target sequence(5′-UGUAUAUA). Protein repeats are indicated by squares and RNA bases byovals (dashed lines, hydrogen bonds; parentheses, van der Waalscontacts). For library screening, the third RNA base was mutated tocytosine (C3) and served as a new target. Nucleotides encoding positions1043 and 1047 of the PUF were randomized in the screened library. (PanelB) Illustration of the yeast three-hybrid assay used to screen the PUFlibrary for binding to C3 RNA (5′-ugCauaua-3′) and to measure PUF-RNAinteraction. The interaction between Ga14-PUF and target RNA fused withMS2 binding sequence can trigger the expression of both reporter genes,LacZ and HIS3. (Panel C) Sequences of the PUF library (SEQ ID NO:92)with randomized coding sequences at positions 1043 and 1047. (Panel D)Validation of the yeast three-hybrid system. The expression of thereporter genes, p LacZ (left panel) and HIS3 (right panel), was measuredfor yeast expressing wild-type PUF and the wild type RNA (U3) or themutated RNA C3. Positive binding was found only when wild-type PUF andU3 RNA were expressed. (Panel E) Measurement of specific interactionsbetween PUF domains and RNAs with base substitutions at position 3.Positions of the mutated amino acids and RNA bases are indicated in theleft panel. Protein-RNA binding was measured with the yeast three-hybridsystem using liquid β-galactosidase assays. For each sample, 12 colonieswere picked and the experiments were performed in triplicate. Theβ-galactosidase activity relative to that of the wild-type PUF-U3 RNApair was plotted to reflect the strength of protein-RNA interaction(right). The fold increase in binding of the mutant protein to thecognate base vs. the non-cognate base in the wild-type RNA is indicatedabove the bars.

FIG. 28. The cytosine-recognition code can be transferred to other PUMrepeats. (Panel A) Mutation of PUM repeat 2 to convert its bindingspecificity to recognize C7 RNA. Indicated mutations were introduced inrepeat 2 (left). Protein-RNA binding measured with the yeastthree-hybrid system using β-galactosidase activity is shown (right).Wild-type PUF and its cognate target RNA were included in allexperiments as controls and its relative activity set to 1. (Panel B)Mutation of PUM repeat 5 to convert its binding specificity to recognizeC4 RNA. Indicated mutations were introduced in repeat 5. (Panel C)Mutation of PUM repeat 6 to convert its binding specificity to recognizeC3 RNA. Indicated mutations, including two different stacking residues,were introduced. (Panel D) Mutation of PUM repeat 7 to convert itsbinding specificity to recognize C2 RNA. Indicated mutations, includingtwo different stacking residues, were introduced. For panels A-D, thefold increase in binding of the mutant protein to the cognate base vs.the non-cognate base in the wild-type RNA is indicated above the bars.(Panel E) Mutation of PUM repeat 3 to convert its binding specificity torecognize C6 RNA. Indicated mutations, including mutation of the basestacking residue of repeat 4, were introduced. For all panels, theexperimental conditions and data analyses are similar to that in panelA.

FIG. 29. Designed PUF domains that recognize targets with multiplecytosines. (Panel A) Diagram showing the mutations in PUM repeats 2 and6 to recognize C3C7 RNA. Relative protein-RNA binding is shown as inFIG. 28. (Panel B) Stepwise generation of a PUF mutant (PUF-D) that canbind to (CUG)_(n) repeat RNA. Diagrams showing the mutations in PUMrepeats 1, 3, 4, 5 and 6 (PUF-D) or in PUM repeats 1, 2, 3, 5, 7 and 8(PUF-E) to recognize (CUG)_(n) repeat RNA. (Panel C) Relativeprotein-RNA binding to WT or (CUG)₅ RNA is shown as in FIG. 28.

FIG. 30. Crystal structure of PUF-R6(SYxxR) in complex with C3 RNA.(Panel A) Interaction of PUF-R6(SYxxR) with C3 RNA. Ribbon diagram ofinteraction of repeat 6 with C3 base (complex 1 with chain A and Cshown). (Panel B) Interaction of wild-type PUF(NYxxQ) with U3 RNA.Ribbon diagram of interaction of repeat 6 with U3 base. RNA andbase-interacting side chains are shown as stick models. Hydrogen bondsare indicated with dashed lines. This figure was created with PyMol.

FIG. 31. Using the cytosine-recognition code to direct engineeredsplicing factors. (Panel A) Modulating alternative splicing of acassette exon in a reporter RNA. Diagram of how the two types of ESFscan affect splicing of a cassette exon (left). Gly-PUF ESF directed tothe exonic target can increase exon inclusion whereas the RS-PUF ESF candecrease exon inclusion. RT-PCR products of splicing reactions (topright) and quantification of splicing (bottom right). The splicingreporter gene and expression vectors for different ESFs wereco-transfected at 1:2 ratio into 293T cells. Total RNA was purified 24hours after transfection and splicing of test exon was detected withRT-PCR. The percentage of exon included isoform among all isoforms isrepresented with psi value (percent-spliced-in). The transfections werecarried out in duplicate, and means of the ψ value were plotted witherror bar indicating the data range. Significant changes (p values are0.04 and 0.01 for lanes 2 and 3 as judged by paired t-test) wereobserved for ESFs that recognize cognate C-containing target. (Panel B)Design of ESFs to target endogenous VEGF-A pre-mRNA splicing. The gene(SEQ ID NOS:93 and 94) and protein sequences (SEQ ID NOS:95 and 96) ofVEGF-A in the region near the alternative splice sites are shown withtwo PUFs recognizing different cytosine-containing sequences. To shiftthe splicing towards anti-angiogenic VEGF-A isoforms, the culturedMDA-MB-231 cells were transfected with 1 μg expression vectors ofGly-PUF#1 or RS-PUF#2. Total RNA was purified 24 hours aftertransfection to detect VEGF-A splicing by RT-PCR. The percentages of bisoforms were quantified and are plotted below the gel.

FIG. 32. Natural PUF proteins with putative cytosine-recognition code.(Panel A) Alignment and phylogenetic tree of the putative C-recognitionPUM repeat in Nop9p homologs from yeast, plants, filamentous fungi andprotists (SEQ ID NOS:97-126). The query sequences were selected tomaximize the divergence of the species, but are otherwise arbitrary. TheGiardia protein EES98274 was chosen as the outgroup in the phylogenetictree.

(Panel B) Alignment of the putative C-recognition PUM repeat in Nop9phomologs from the HomoloGene database (SEQ ID NOS:127-142). Thehomologous Volvox carteri protein (Accession No. XP_002952190) wasincluded in the alignment as the outgroup in the phylogenetic tree. Theconserved positions for cytosine recognition are highlighted.

DETAILED DESCRIPTION OF THE INVENTION

The present invention will now be described more fully hereinafter. Thisinvention may, however, be embodied in different forms and should not beconstrued as limited to the embodiments set forth herein. Rather, theseembodiments are provided so that this disclosure will be thorough andcomplete, and will fully convey the scope of the invention to thoseskilled in the art.

The terminology used in the description of the invention herein is forthe purpose of describing particular embodiments only and is notintended to be limiting of the invention. As used in the description ofthe invention and the appended claims, the singular forms “a”, “an” and“the” are intended to include the plural forms as well, unless thecontext clearly indicates otherwise.

Unless otherwise defined, all terms (including technical and scientificterms) used herein have the same meaning as commonly understood by oneof ordinary skill in the art to which this invention belongs. It will befurther understood that terms, such as those defined in commonly useddictionaries, should be interpreted as having a meaning that isconsistent with their meaning in the context of the present applicationand relevant art and should not be interpreted in an idealized or overlyformal sense unless expressly so defined herein. The terminology used inthe description of the invention herein is for the purpose of describingparticular embodiments only and is not intended to be limiting of theinvention. All publications, patent applications, patents and otherreferences mentioned herein are incorporated by reference in theirentirety.

Also as used herein, “and/or” refers to and encompasses any and allpossible combinations of one or more of the associated listed items, aswell as the lack of combinations when interpreted in the alternative(“or”).

Unless the context indicates otherwise, it is specifically intended thatthe various features of the invention described herein can be used inany combination.

Moreover, the present invention also contemplates that in someembodiments of the invention, any feature or combination of features setforth herein can be excluded or omitted.

To illustrate, if the specification states that a complex comprisescomponents A, B and C, it is specifically intended that any of A, B orC, or a combination thereof, can be omitted and disclaimed singularly orin any combination.

As used herein, the transitional phrase “consisting essentially of” (andgrammatical variants) is to be interpreted as encompassing the recitedmaterials or steps “and those that do not materially affect the basicand novel characteristic(s)” of the claimed invention. See, In re Herz,537 F.2d 549, 551-52, 190 U.S.P.Q. 461, 463 (CCPA 1976) (emphasis in theoriginal); see also MPEP § 2111.03. Thus, the term “consistingessentially of” as used herein should not be interpreted as equivalentto “comprising.”

The term “about,” as used herein when referring to a measurable valuesuch as an amount or concentration (e.g., the signal-to-backgroundratio) and the like, is meant to encompass variations of 20%, 10%, 5%,1%, 0.5%, or even 0.1% of the specified amount.

The present invention provides a new class of synthetic RNAendonucleases that are analogous to DNA restriction enzymes. Theseartificial site-specific RNA endonucleases (ASREs) were created bycombining an RNA endonuclease (e.g., the PIN domain of human SMG6, witha series of modified PUF domains specifically recognizing an 8-nt regionof RNA as an RNA substrate. The ASRE binds this RNA substrate with highaffinity and efficiently makes a single cleavage near the binding site.ASREs containing different PUF domains can recognize and cleave distinctsubstrates without detectable activity between non-cognate ASRE/RNApairs. Such cleavage requires manganese (II) ions, and a mutation at theactive site of PIN domain abolishes the endonuclease activity. Since aPUF domain can be designed to recognize almost any 8-nt RNA, a largepanel of ASREs can be designed to recognize and cleave various RNAtargets. Expression of an ASRE of this invention that recognizes anendogenous mRNA in bacterial cells can silence the target gene byinducing mRNA cleavage. This new class of RNA endonucleases provides auseful tool to manipulate RNA in vitro, and for silencing geneexpression in organisms where interfering RNA (RNAi) machinery is notavailable.

Thus, in one embodiment, the present invention provides a synthetic RNAendonuclease comprising the formula: A-B-C, wherein: A is an RNA bindingdomain, B is a linker peptide, and C is a cleavage domain.

RNA Binding Domain

In various embodiments, the RNA binding domain can be a Pumilio homologydomain (PU-HUD) and in particular embodiments, the PU-HUD can be a humanPumilio 1 (PUF) domain. However, it is to be understood that the RNAbinding domain can be any protein or peptide that binds RNA in ssequence specific manner. In particular, the RNA binding domain can beany protein or peptide that contains modular armadillo repeats (ARMrepeats) and in particular can be any such modular protein that bindsRNA in a sequence specific manner wherein the RNA specificity can bechanged by modifying the amino acid side chain(s) of the protein. Thus,the RNA binding domain can be, for example any PUF protein family memberwith a Pum-HD domain. Nonlimiting examples of a PUF family memberinclude FBF in C. elegans, Ds pum in Drosophila and PUF proteins inplants such as Arabidopsis and rice (the genes are yet to be named. Aphylogenetic tree of the PUM-HDs of Arabidopsis, rice and other plantand non-plant species is provided in Tam et al. (“The Puf family ofRNA-binding proteins in plants: phylogeny, structural modeling, activityand subcellular localization” BMC Plant Biol. 10:44, 2010, the entirecontents of which are incorporated by reference herein). PUF familymembers are highly conserved from yeast to human and all members of thefamily bind to RNA in a sequence specific manner with a predictablecode. The accession number for the domain is PS50302 in the Prositedatabase (Swiss Institute of Bioinformatics) and a sequence alignment ofsome of the members of this family is shown in FIG. 5 (ClustalW multiplesequence alignment of human, mouse, rat Pumilio 1 (hpum1, Mpum1,Ratpum1) and human and mouse Pumilio2 (hpum2, Mpum2), respectively, isshown. The Drosophila pum1 is very different in length from othermammalian Pumilio1 homologues. The C-terminal PUF HUD domain is onlyshown in the sequence alignment.

FIG. 6 shows the ClustalW amino acid sequence alignment and comparisonof the Drosophila Pumilio (PumDr) and human PUM1 and PUM2 proteins. TheN-terminal part of human and fly Pum proteins shows weak homology (40%similarity) and differs significantly in size and protein sequence. TheC-terminal part shows a very high degree of homology and evolutionaryconservation (78% identity, 86% similarity for PUM1and 79% identity, 88%similarity for PUM2), with highly conserved protein sequence andstructure of the Pum RNA-binding domain. In all three proteins PUM-HD iscomposed of the N-terminal conserved part of 20 amino acids, eight Pumrepeats of 36 amino acids each, and the C-terminal conserved region. Inhuman Pumilio proteins the C-conserved part is 44 amino acids long,whereas Drosophila protein has an insert of additional 85 amino acids inthe C-conserved region. The nucleotide and amino acid sequences can befound in the DDBJ/EMBL/GenBank® databases under accession nos. AF315592(PUM1) and AF315591 (PUM2) (Spassov & Jurecic, “Cloning and comparativesequence analysis of PUM1 and PUM2 genes, human members of the Pumiliofamily of RNA-binding proteins” Gene, 299:195-204, October 2002, theentire contents of each of which (publication and sequences) areincorporated by reference herein).

In some embodiments, the PUF domain of this invention can be made up ofeight 36 mers, in which 33 of the amino acids are conserved and the34^(th), 35^(th) and 36^(th) amino acids can vary, imparting specificityfor a particular base in an RNA sequence. In particular embodiments, theRNA binding domain is about 300 (e.g., 310, 309, 308, 307, 306, 305,304, 303, 302, 301, 300, 299, 298, 297, 296, 295, 294, 293, 292, 291,290, etc.) amino acids in length. In particular embodiments, the RNAcleavage domain is about 120 amino acids in length. In some embodiments,the RNA endonuclease of this invention is designed to bind to a specificRNA sequence of about 8 nucleotides (e.g., 8-16 contiguous RNA bases) toposition the cleavage domain to cut the RNA at a specific site. In someembodiments, the RNA can be cut between any of the 8 contiguous RNAbases as well as at any site upstream and/or downstream of the 8-ntsequence bound by the RNA binding domain (e.g., 1, 2, 3, 4, 5, 6, 7, 8,9, 10 or more nucleotides upstream or downstream of the 8-nt RNAsequence). In particular embodiments, the fifth nucleotide of the 8-ntsequence is a U or C, while the other 7 nucleotides can vary.

One aspect of this invention is the ability to modify the RNA bindingdomain of the ASRE of this invention to target a specific RNA sequence.Thus, in some embodiments, the RNA endonuclease of this inventioncomprises an RNA binding domain (e.g., Puf-HUD) that is modified to bindan RNA sequence that is different than the RNA sequence bound by anunmodified (e.g., wild type) RNA binding domain. The RNA sequence can beabout an 8 mer (e.g., an 8 mer, 9 mer, 10 mer, 11 mer, 12 mer, l3 mer,14 mer, 15 mer, 16 mer, etc.). The ability to introduce modificationsinto the amino acid sequence of the RNA binding domain to alter itsspecificity for a target RNA sequence is based on the known interactionsof bases with the different amino acid side chains of the RNA bindingdomain (e.g., Puf protein) (FIG. 1A). FIG. 7 shows the RNA recognitioncode of the PUF domain. Thus, the recognition code can be generallywritten as:

-   SerXXXGlu→G (guanine, SEQ ID NO:1),-   CysArgXXXGln→A (adenine, SEQ ID NO:2),-   AsnXXXGln→U (uracil, SEQ ID NO:3), and-   SerTyrXXArg (cytosine, SEQ ID NO:4),-   where X is any amino acid. In some embodiments, the recognition code    for A (adenine) can be SerArgXXXGln (SEQ ID NO:5).

Other RNA binding domains that can be employed in the RNA endonucleaseof this invention include RNA binding domains (RBDs), such as in mostsplicing proteins, including heteronuclear ribonuclear proteins (HNRNP)and the K homology group of proteins (KH loop proteins); any of whichcan be used as the RNA binding module of the ASREs of this invention.

Linker Peptide

The ASREs of this invention comprise a linker peptide, which can be fromabout three amino acids to about 20 amino acids in length (e.g., 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, etc.) amino acids). In some embodiments,a particular linker peptide of this invention is VDTGNGS (SEQ ID NO:18).However it is to be understood that any suitable linker peptide can beused in the ASRE of this invention. Basic structural guidelines forselection of a linker peptide sequence are that the linker sequence isideally rich in neutral to polar amino acids that have a slight helicalpropensity. In some embodiments, the linker peptide forms an alphahelical structure. Proline (Pro) and aromatic amino acids (Phe, Tyr,Trp) are typically not used in the linker peptide sequence. Thus, insome embodiments, a linker peptide of this invention does not comprise aproline, a phenylalanine, a tyrosine and/or a tryptophan. Table 2 andFIG. 26 show various nonlimiting examples of linker peptides of thisinvention.

Cleavage Domain

In various embodiments of the RNA endonuclease of this invention, thecleavage domain can be the PilT N-terminus (PIN) domain of SMG6.However, it is to be understood that any suitable cleavage domain can beused in the ASREs of this invention. In general, a cleavage domain ofthis invention typically would not exceed 30 KDa and it would haveindependent activity in trans. The cleavage domain may also have a RNAseH/A-like fold at the active site lined by acidic residues (Asp/Glu) orHis, which acts via a metal ion (divalent or tetravalent) and can cleavethe phosphodiester bond in the nucleic acid backbone.

In particular embodiments, the PIN domain of hsMG6 (EST1A; GenBank®Database Accession No. NM_017575, incorporated by reference herein;synonyms include C17 or f31, KIAA0732 and SMG-6)) can be used in theASREs of this invention. (see FIG. 1A for PIN domain structure). FIG. 8shows a sequence alignment of the PIN domains of EST1A family proteinsfrom Homo sapiens (hu_pin), Mus musculus (Ms_pin), Drosophilamelanogaster (Dm_pin), and Caenorhabditis elegans (Ce_pin) (upperalignment). Conserved residues are shown by asterisk. The PIN domain hasan RnaseH like active site fold and is also very similar in active sitearchitecture to an Archaebacterial PIN domain.

FIG. 9 shows the structure of the potential functional residues of thesmg6 PIN domain and archaeal PIN-domain proteins. A stereoview of theresidues at the putative active sites of the smg6 PIN domain is shown.Molecules A and B in the asymmetric unit are superimposed. ResiduesD1251, E1282, D1353, T1390, and D1392 are shown as stick models, and thefour water molecules are shown as spheres. The hydrogen bonds are shownas dashed lines.

FIG. 10 shows structurally similar proteins. (Panel A) Topology diagramsof the PIN domain of hEST1A, archaeal PIN-domain protein PAE2754, andthe 50 nuclease domain of Taq polymerase. Strands are represented byarrows and helices by rectangles. The core structures of these proteinsconsist of a parallel β-sheet with strand order 32145. The structurallysimilar elements are shown in different grey shades. Asterisks indicatethe residue positions at the active site of Taq polymerase and at theputative active sites of the hEST1A PIN domain and PAE2754. (Panel B)Ribbon representations of the PIN domain of hEST1A, PAE2754, and the 50nuclease domain of Taq polymerase. The three structures are shown in asimilar orientation. The secondary structure elements of the 50 nucleasefold are shown in grey shades [same as in (Panel A)]. Stars indicate theactive site of the 50 nuclease domain of Taq polymerase and the putativeactive sites of the hEST1A PIN domain and PAE2754 (Takeshita et al.“Crystal structure of the PIN domain of human telomerase-associatedprotein EST1A” Proteins 68:980-989, 2007, the entire contents of whichare incorporated by reference herein.

In some embodiments of this invention, the RNA cleavage domain caninclude an RNAse A-like fold (FIG. 11) and/or an RNAse H-like fold (FIG.12).

RNAse A is a relatively small protein (124 residues, ˜13.7 kDa). It canbe characterized as a two-layer a +f3 protein that is folded in half,with a deep cleft for binding the RNA substrate. The first layer iscomposed of three alpha helices (residues 3-13, 24-34 and 50-60) fromthe N-terminal half of the protein. The second layer consists of threeβ-hairpins (residues 61-74, 79-104 and 105-124 from the C-terminal half)arranged in two β-sheets. The hairpins 61-74 and 105-124 form afour-stranded, antiparallel β-sheet that lays on helix 3 (residues50-60). The longest β-hairpin 79-104 mates with a short β-strand(residues 42-45) to form a three-stranded, antiparallel β-sheet thatlays on helix 2 (residues 24-34).

RNase A has four disulfide bonds in its native state: Cys26-Cys84,Cys58-110, Cys40-95 and Cys65-72. The first two (26-84 and 58-110) areessential for conformational folding; each joins an alpha helix of thefirst layer to a beta sheet of the second layer, forming a smallhydrophobic core in its vicinity. The latter two disulfide bonds (40-95and 65-72) are less essential for folding; either one can be reduced(but not both) without affecting the native structure underphysiological conditions. These disulfide bonds connect loop segmentsand are relatively exposed to solvent. The 65-72 disulfide bond has anextraordinarily high propensity to form, significantly more than wouldbe expected from its loop entropy, both as a peptide and in thefull-length protein. This suggests that the 61-74n-hairpin has a highpropensity to fold conformationally. The active site is lined by His119, Lys 66, Lys 41, His 12 and His 12 (PDB ID 2aas).

The RNase H domain is responsible for hydrolysis of the RNA portion ofRNA-DNA hybrids, and this activity requires the presence of divalentcations (Mg²⁺ or Mn²⁺) that bind its active site. This domain is a partof a large family of homologous RNase H enzymes of which the RNase HIprotein from Escherichia coli is the best characterized Secondarystructure predictions for the enzymes from E. coli, yeast, human liverand diverse retroviruses (such as Rous sarcoma virus and the Foamyviruses) supported, in every case, the five beta-strands (1 to 5) andfour or five alpha-helices (A, B/C, D, E) that have been identified bycrystallography in the RNase H domain of human immunodeficiency virus 1(HIV-1) reverse transcriptase and in E. coli RNase H.

In some embodiments of the RNA endonuclease of this invention, the RNAbinding domain can be at the amino terminus of the endonuclease and thecleavage domain can be at the carboxy terminus of the endonuclease. Inthis orientation, sequence specific cleavage can be achieved. In otherembodiments, the cleavage domain can be at the amino terminus of theendonuclease and the RNA binding domain can be at the carboxy terminusof the endonuclease. In this orientation, nonspecific cleavage of RNAcan be achieved (see FIG. 1, Panels B and C and Examples section below).

In some embodiments of this invention, the RNA endonuclease cancomprise, consist of, or consist essentially the following amino acidsequences, designated ASRE(WT), ASRE(6-2/7-2), ASRE(6-2/7-2/1-1),ASRE(531), ASRE(87621), ASRE(LacZ), ASRE(3-2) and mitoASRE(ND5), whichare provided as nonlimiting examples of the ESRAs of this invention, asit is readily apparent to the skilled artisan that numerous other ESRAscan be designed according to the teachings of this invention to targetany RNA sequence for cleavage. The nomenclature of the exemplary ASREsand their respective targets are provided in Table 5.

ASRE (6-2/7-2) (SEQ ID NO: 6)GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGCRVIQKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG ASRE (WT)(SEQ ID NO: 7) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGCRVIQKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGNYVIQHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFASNVVEKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARUESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG ASRE (6-2/7-2/1-1)(SEQ ID NO: 8) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIELKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGCRVIQKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG  ASRE (5-3-1)(SEQ ID NO: 9) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIELKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGSRVIERILEHCLPDQTLPILEELHQHTEQLVQDQYGNYVIQHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFASNVVEKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRICIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG  ASRE (87621)(SEQ ID NO: 10) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIELKLERATPAERQLVFNEILQAAYQLMVDVEGCRVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGCRVIQKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYASYVVEKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG ASRE (LacZ)(SEQ ID NO: 11) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVEGCRVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFASNVVEKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGICHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG ASRE (3-2)(SEQ ID NO: 12) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGNYVIQHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFASNVVEKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG mitoASRE (ND5)(SEQ ID NO: 13) MLFNLRILLNNAAFRNGHNFMVRNFRCGQPLQNKVQDYKDDDDKEFGRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIELKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGNYVIQHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNDDLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG 

ASREs with mutations and used as controls in the studies describedherein are provided below.

ASRE (6-2/7-2) with D->A mutation (SEQ ID NO: 14)GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGCRVIQKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNADLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG ASRE (LacZ) with D->A mutation(SEQ ID NO: 15) GRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIQLKLERATPAERQLVFNEILQAAYQLMVDVFGCRVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGSYVIEHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFASNVVEKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNADLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG mitoASRE (ND5) with D->A mutation(SEQ ID NO: 16) MLFNLRILLNNAAFRNGHNFMVRNFRCGQPLQNKVQDYKDDDDKEFGRSRLLEDFRNNRYPNLQLREIAGHIMEFSQDQHGSRFIELKLERATPAERQLVFNEILQAAYQLMVDVFGNYVIQKFFEFGSLEQKLALAERIRGHVLSLALQMYGSRVIEKALEFIPSDQQNEMVRELDGHVLKCVKDQNGNHVVQKCIECVQPQSLQFIIDAFKGQVFALSTHPYGCRVIQRILEHCLPDQTLPILEELHQHTEQLVQDQYGNYVIQHVLEHGRPEDKSKIVAEIRGNVLVLSQHKFANNVVQKCVTHASRTERAVLIDEVCTMNDGPHSALYTMMKDQYANYVVQKMIDVAEPGQRKIVMHKIRPHIATLRKYTYGKHILAKLEKYYMKNGVDLGVDTGNGSQMELEIRPLFLVPDTNGFIDHLASLARLLESRKYILVVPLIVINELDGLAKGQETDHRAGGYARVVQEKARKSIEFLEQRFESRDSCLRALTSRGNELESIAFRSEDITGQLGNNADLILSCCLHYCKDKAKDFMPASKEEPIRLLREVVLLTDDRNLRVKALTRNVPVRDIPAFLTWAQVG

It would be well understood by the ordinary artisan that the RNAendonucleases of this invention can be employed in various methods, bothin vitro and in vivo. Thus, in one embodiment, the present inventionprovides a method of detecting a target RNA in a sample, comprising: a)contacting the sample with an RNA endonuclease of this invention underconditions whereby cleavage of RNA occurs if the target RNA is presentin the sample and wherein the RNA binding domain of the RNA endonucleaseis modified to bind the target RNA; and b) detecting a cleavage productof the target RNA, thereby detecting the target RNA in the sample. As anonlimiting example, the RNA endonuclease of this invention can functionat about pH 7.5 (e.g., in a range of about pH 7.0 to about pH 8.0) inthe presence of divalent metal cation. Activity is observed in thepresence of manganese, magnesium and cobalt, with enzyme activity in theorder of Mn²⁺>Mg²⁺, >Co²⁺. The cleavage product can be visualized onurea-polyacrylamide or denatured formaldehyde agarose gels by stainingwith ethidium bromide or by SYBR green dyes. The RNA digestion productcan also be detected by radioactive methods as well as non-radioactivemethods, including, e.g., DIG and cyanine dye labeled probes.

In addition, the present invention provides a method of cleaving atarget mRNA in a sample, comprising contacting the sample with the RNAendonuclease of this invention under conditions whereby cleavage of thetarget mRNA occurs and wherein the RNA binding domain of the RNAendonuclease is modified to bind the target mRNA, thereby cleaving thetarget mRNA in the sample.

In yet further embodiments, the present invention provides a method ofcleaving a target mRNA in a cell, comprising introducing into the cellthe RNA endonuclease of this invention, wherein the RNA binding domainof the RNA endonuclease is modified to bind the target mRNA, underconditions whereby cleavage of the target mRNA occur, thereby cleavingthe target mRNA in the cell. The amount of cleavage of the target mRNAcan be, for example, 1%, 2%, 3%, 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%,60%, 70%, 80%, 90% 95%, 98% or 100% as compared with a suitable control.

As nonlimiting examples, the following methods can be used to introduceexpression vectors that encode ASRE in various cell types:

-   -   1. A nucleic acid vector (e.g., a plasmid vector) encoding the        RNA endonuclease can be delivered directly to bacterial cells or        cultured cells (e.g., mammalian cells) by electroporation.    -   2. A nucleic acid vector (e.g., a plasmid vector) encoding the        RNA endonuclease can be delivered directly to bacterial cells by        chemical transformation.    -   3. A viral vector (e.g., a retroviral vector, adenoviral vector,        an adeno associated viral vector, an alphavirus vector, a        vaccinia viral vector, a herpesviral vector, etc., as are known        in the art) comprising a nucleotide sequence encoding the RNA        endonuclease can be used to deliver the RNA endonuclease to        cells (e.g., mammalian cells).    -   4. A baculovirus expression system can be used to deliver the        RNA endonuclease to insect cells.    -   5. Agrobacterium mediated delivery can be employed in plants.    -   6. Lipid mediated delivery (e.g., lipofectamine, oligofectamine)        can also be employed for mammalian cells.

In some embodiments, the RNA endonuclease of this invention can bedirectly introduced in various cell types using membrane penetratingpeptide (aka, cell penetrating peptide). This can involve fusing the RNAendonuclease with the membrane penetrating peptide.

In various embodiments, the nucleotide sequence encoding the RNAendonuclease of this invention can be present in a cell transientlyand/or can be stably integrated into the genome of the cell and/or thegenome of the cell. The nucleotide sequence can also be stably expressedin the cell even without being integrated into the genome, via a plasmidor other nucleic acid construct as would be well known in the art.

In some embodiments, a systemic delivery of RNA endonuclease expressionvectors into animals can be achieved by nanoparticles or viral vectorssuch as those commonly used in gene therapy and as are well known in theart.

The present invention provides a nucleic acid molecule comprising anucleotide sequence encoding an RNA endonuclease of this invention, aswell as a nucleic acid construct (e.g., vector, plasmid, etc. comprisingsuch a nucleic acid molecule and a cell comprising such a nucleic acidmolecule and/or nucleic acid construct. A cell comprising a nucleic acidmolecule, nucleic acid construct, vector and/or polypeptide of thisinvention is also provided herein. A composition comprising such a cell,nucleic acid molecule, nucleic acid construct, vector and/or polypeptideof this invention in a carrier, such as a pharmaceutically acceptablecarrier is further provided herein.

Also provided herein is a method of inhibiting (e.g., silencing)expression of a target gene in a cell, comprising introducing into thecell the RNA endonuclease of this invention, wherein the RNA bindingdomain of the RNA endonuclease is modified to bind mRNA encoding a geneproduct of the target gene, under conditions whereby cleavage of themRNA occurs, thereby inhibiting (either partially or totally) expressionof the target gene in the cell. Expression of the target gene can beinhibited, for example, by 1%, 2%, 3%, 5%, 10%, 15%, 20%, 25%, 30%, 40%,50%, 60%, 70%, 80%, 90% 95%, 98% or 100% as compared with a suitablecontrol.

In some embodiments of the methods of inhibiting gene expression asdescribed herein, the cell can be a bacterium (e.g., a pathogenic strainof a bacterium) or the cell can be in an organism, such as a parasite.

Nonlimiting examples of a bacterium of this invention includeEscherichia coli, Bacillus anthracia, Bordatella pertussis , Borreliaburgdorferi, Brucella canis, Brucella melitensis, Brucella suis,Chlamydia pneumoniae, Chlamydia trachomatis, Clostridium botulinum,Clostridium difficile, Pseudomembranous colitis, Clostridiumperfringens, Enterococcus faecalis, Enterococcus faecium, Legionellapneumophila, Neisseria gonorrhoea and Yersinia pestis.

Nonlimiting examples of a parasite of this invention include Plasmodiumfalciparum, Toxoplasma gondii, Leishmania donovan and Trypanosoma cruzi.

The present invention further provides a method of cleaving a targetmRNA in a mitochondrion in a cell, comprising introducing into the cellthe RNA endonuclease of this invention, wherein the RNA binding domainof the RNA endonuclease is modified to bind the target mRNA in themitochondrion and wherein the RNA endonuclease comprises a mitochondrialtargeting signal sequence, under conditions whereby cleavage of thetarget mRNA in the mitochondrion occurs, thereby cleaving the targetmRNA in the mitochondrion in the cell. This method can also be employedto target an RNA in the nucleus of a cell, as well as an RNA in achloroplast of a plant, according to the methods described herein.

Additionally provided herein is a method of inhibiting expression of atarget mitochondrial gene in a cell, comprising introducing into thecell the RNA endonuclease of this invention, wherein the RNA bindingdomain of the RNA endonuclease is modified to bind mRNA encoding a geneproduct of the target mitochondrial gene and wherein the RNAendonuclease comprises a mitochondrial targeting signal sequence, underconditions whereby cleavage of the target mRNA in the mitochondrionoccurs, thereby inhibiting expression of the target mitochondrial genein the cell. Nonlimiting examples of mitochondrial genes that can betargeted according to methods of this invention include NADHdehydrogenase (complex 1) genes (e.g., MN-ND1, MT-ND2, MT-ND3, MT-ND4,MT-ND4L, MT-ND5, MY-ND6), Coenzyme Q-cytochrome c reductase/Cytochrome b(complex III) (e.g., MT-CYB), cytochrome c oxidase (complex IV) (e.g.,MT-CO1, MT-CO2, MT-CO3) and ATP synthase (e.g., MT-ATP6, MT-ATP8). TheRNA endonuclease can be introduced into the cell via a viral vectorcomprising a nucleotide sequence encoding the RNA endonuclease and insome embodiments, the viral vector can be an adeno associated viralvector. Furthermore the cell of these methods can be in an organism,which can be a mammal and in some embodiments, is a human subject.

Additional embodiments of this invention include a method of treatingdystrophia myotonica (DM) in a subject, comprising administering to thesubject an effective amount of the RNA endonuclease of this invention,wherein the RNA binding domain of the RNA endonuclease is modified tobind mRNA encoding (CUG)n repeats in the 3′ UTR of DMPK to treat DM1and/or mRNA encoding (CCUG)n repeats in intron 1 of the ZNF9 gene totreat DM2 and wherein the RNA endonuclease comprises a nuclear targetingsignal sequence, thereby treating DM in the subject. Such a method canbe expanded to encompass treating various trinucleotide repeat disorders(e.g., Alzheimer's disease) as are known in the art, by employing theteachings of this invention.

The methods of the present invention can also be employed for treatinginfection by an RNA virus in a subject according to the teachings setforth herein.

Also as used herein, the terms “treat,” “treating” or “treatment” referto any type of action that imparts a modulating effect, which, forexample, can be a beneficial and/or therapeutic effect, to a subjectafflicted with a condition, disorder, disease or illness, including, forexample, improvement in the condition of the subject (e.g., in one ormore symptoms), delay in the progression of the disorder, disease orillness, prevention or delay of the onset of the disease, disorder, orillness, and/or change in clinical parameters of the condition,disorder, disease or illness, etc., as would be well known in the art.

As used herein “effective response” or “responding effectively” means apositive or beneficial response to a particular treatment in contrast toa “lack of an effective response” which can be an ineffectual, negativeor detrimental response as well as the lack of a positive or beneficialresponse. An effective response or lack of effective response (i.e.,ineffective response) is detected by evaluation, according to knownprotocols, of various immune functions (e.g., cell-mediated immunity,humoral immune response, etc.) and pharmacological and biologicalfunctions as would be known in the art.

“Effective amount” refers to an amount of a compound or composition ofthis invention that is sufficient to produce a desired effect, which canbe a therapeutic and/or beneficial effect. The effective amount willvary with the age, general condition of the subject, the severity of thecondition being treated, the particular agent administered, the durationof the treatment, the nature of any concurrent treatment, thepharmaceutically acceptable carrier used, and like factors within theknowledge and expertise of those skilled in the art. As appropriate, an“effective amount” in any individual case can be determined by one ofordinary skill in the art by reference to the pertinent texts andliterature and/or by using routine experimentation. (See, for example,Remington, The Science And Practice of Pharmacy (20th ed. 2000)).

An exemplary dosage range for the administration to a subject of anucleic acid molecule comprising a nucleotide sequence of this inventionin the form of a viral vector can be, for example, from about 5×10¹²viral genomes per kg to about 5×10¹⁵ viral genomes per kg. One of skillin the art would be able to determine the optimal dose for a givensubject and a given condition.

A method is also provided herein of detecting an RNA virus in a sample,comprising: a) contacting the sample with the RNA endonuclease thisinvention under conditions whereby cleavage of RNA occurs if RNA of theRNA virus is present in the sample and wherein the RNA binding domain ofthe RNA endonuclease is modified to bind a target RNA of the RNA virus;and b) detecting a cleavage product of the target RNA, thereby detectingthe RNA virus in the sample.

Additionally provided is a method of diagnosing infection by an RNAvirus in a subject, comprising: a) contacting the sample from thesubject with the RNA endonuclease of this invention under conditionswhereby cleavage of RNA occurs if viral RNA (e.g., of the RNA virus) ispresent in the sample and wherein the RNA binding domain of the RNAendonuclease is modified to bind target viral RNA; and b) detecting acleavage product of the target viral RNA, thereby detecting viral RNA inthe sample and thereby diagnosing infection by an RNA virus in thesubject. Nonlimiting examples of an RNA virus include retroviruses,alphaviruses, flaviruses, etc., as are well known in the art.

Furthermore, the present invention provides a method of identifying astrain of an RNA virus (e.g., a particular strain of an RNA virus) in asample, comprising: a) contacting the sample with the RNA endonucleaseof this invention under conditions whereby cleavage of RNA occurs if RNAof the strain of the RNA virus is present in the sample and wherein theRNA binding domain of the RNA endonuclease is modified to bind a targetRNA specific to the strain of the RNA virus; and b) detecting a cleavageproduct of the target RNA, thereby identifying the strain of the RNAvirus in the sample.

Various methods described herein can be used, for example, to probe anRNA structure (e.g., characterize an unknown or partially identified RNAstructure), as well as to detect and/or identify an RNA virus in asample by detecting an RNA cleavage pattern that detects and/oridentifies the RNA virus. The design of such methods, employing theASREs of this invention, would be well known to those of skill in theart. As one example, the cleavage will provide a unique tool tofractionate single stranded RNA into small pieces prior to deepsequencing of RNA. The ASRE-mediated cleavage of this invention can alsobe used, for example, to distinguish single stranded RNA from doublestranded RNA, and/or be used to probe the structure or RNA.

EXAMPLES Example 1

The present invention provides a protein “restriction enzyme” of RNAs,which specifically recognizes an 8-nt RNA sequence and makes a singlecleavage in the target. As the data shown below demonstrate, theseenzymes efficiently and specifically cleave diverse RNA targets not onlyin vitro, but also in bacterial cells in mitochondria—any cells thesedata-reporter protein with target sequence. Mitochondrial target isendogenous gene. Thus, this invention provides new methods for in vitrodetection or manipulation of RNA (e.g. generating an RNA digestion map)and for modulating (e.g., inhibiting) gene expression in organisms whereinterfering RNA (RNAi) machinery is not available.

Construction and Expression of ASREs

The PUF domain of human Pum1 (residue 828-1176) (SwissProt Accession No.Q14671 Pum1-human; incorporated by reference herein) and the PIN domainof human Smg6 (residue 1238-1421) (SwissProt Accession No. Q86US8;incorporated by reference herein) were amplified with PCR and joined tosequences encoding different peptide linkers designed according to aminoacid propensity of known protein linkers. The resulting fusion proteinswere cloned in expression vector pT7HTb or pET43.1b, both of whichencode an N-terminal His×6 tag; the latter also included a soluble Nustag that can be removed by enterokinase digestion. To producerecombinant ASREs, the expression constructs were introduced intoBL21(DE3) E. coli cells and expression was induced with 0.3 mM IPTG. Thebacterial cells were disrupted by sonication in lysis buffer and therecombinant proteins were purified with a Ni-NTA column (His-Gravy Trap,GE Health Care). Protein purity was assessed with SDS-PAGE and theproteins were further concentrated in 20 mM HEPES pH 7.0, 150 mM NaCl, 1mM DTT, and stored at −20° C. in aliquots supplemented with 50%glycerol. Preparations were stable for 2-3 months under theseconditions.

Nomenclature of ASREs

The ASREs were named according to the identity of their PUF domains,since most ASREs all had the same PIN domain. Each PUF domain recognizesits 8-nt target sequence in an anti-parallel fashion, with the firstrepeat recognizing the 8^(th) base and so on. WT PUF repeat: Cterminus-8 7 6 5 4 3 2 1 -N terminus RNA target (5′ to 3′ ends): 5′-U GU A U AU A-3′

The targets of a modified PUF were named in a reverse order (from C to Nterminus) with the number of the mutated PUF repeat and the baserecognized by the corresponding repeat. For example, RNA target 7u6grepresents UugAUAUA sequence, with the mutated base shown in lower case.The PUF domains were named by the mutated repeats and the number of themutated amino acid in each repeat. For example, ASRE(6-2/7-2) has twomutated amino acids in each of the 6^(th) and 7^(th) repeat, causing itto recognize the 7u6g RNA. See Table 5 for more examples.

Substrate Preparation and Digestion Reaction

DNA fragments containing single PUF domain binding sites were amplifiedby PCR and inserted between the HindIII and XbaI sites of pcDNA3. Theresulting plasmids were linearized by overnight XbaI digestion,purified, and used as templates in an in vitro transcription reactionwith T7 RNA polymerase (NEB). The reaction mixtures were treated furtherwith RNase-free DNAse I (Promega) to remove template and the RNAproducts were purified. All ASRE digestion reactions were performed inbuffers containing 20 mM HEPES (pH 8.0), 150 mM NaCl, and 10% glycerolsupplemented with respective divalent metal ions. Reactions were carriedout at 37° C. for various times, and stopped by adding Ficoll-UREARNA-loading buffer and heating at 70° C. The resulting products wereseparated on 10% urea-PAGE gels, stained with ethidium bromide, scannedon a Typhoon Trio⁺scanner, and quantified with Image-Quant software (GEHealth Care).

Determination of the RNA Cleavage Site by DSS-RACE

To determine the cleavage sites of ASREs, the 5′ and 3′ digestionproducts were extracted from denaturing urea-PAGE gels, and RACE wasused to map both ends of the cleavage products. To map the 3′ end of the5′ fragment, polyA polymerase was used to add a 10-30 nt poly-A tail atthe end of the 5′ fragment, and the resulting RNA was reversetranscribed with a poly-T containing primer. The resulting cDNA wasamplified by nested PCR, cloned in pcDNA3, and sequenced (FIG. 17, PanelA). To map the 5′ end of the 3′ cleavage fragment (FIG. 17, Panel B),the gel extracted RNA was first reverse transcribed with a specificprimer. The resulting cDNA was purified and elongated with terminaltransferase in the presence of dATP. Second strand synthesis was carriedout using a poly-T-containing primer and Klenow fragment (NEB). Theresulting product was PCR amplified, cloned and sequenced.

Gene Silencing by ASRE in Bacterial Cells

E coli BL21(DE3) was transformed with either empty vector pET43.1b, orthe vector encoding ASRE(lacZ) or control ASREs not targeting lacZ mRNA.Multiple colonies were selected in each sample to circumvent clonalvariation. Protein expression of each clone was induced with 0.3 mM IPTGfor 6 h at 37° C. To measure β-galactosidase activity, cells were lysedwith two rounds of sonication in cold buffer and the standardβ-galactosidase activity assay for all lysates was carried out in a 96well plate format as described previously²⁷. Total proteinconcentrations of each sample were measured using the Bradford method(Pierce) to ensure that equal amounts of total protein were used in theactivity assays.

To measure mRNA levels, total cellular RNA was purified with a QiagenRNeasy kit and treated with DNase I. Equal amounts of total RNA (300-500ng) from each sample were used to make first strand cDNA by randompriming (High capacity cDNA reverse transcription kit from AppliedBiosystem), and the lacZ mRNA was measured with q-PCR using a SYBR greenkit of Applied Biosystem. Immunoblotting with a monoclonalβ-galactosidase antibody (Santa Cruz cat #sc-56394) was used to measureβ-galactosidase protein levels.

Construction and Expression of ASREs

The PUF domain of human Pum 1 (residue 828-1176) (SwissProt AccessionNo. Q14671 Pum1-human; incorporated by reference herein) and the PINdomain of human Smg6 (residue 1238-1421) (SwissProt Accession No.Q86US8; incorporated by reference herein) were amplified with PCR primerpairs 1,2 and 3,4 (Table 4) using Roche Hi-fidelity Taq DNA polymerase.DNA fragments encoding the two domains were joined with sequencesencoding different peptide linkers designed according to amino acidpropensity of known protein linkers. To change the linker length,different forward primers of the PIN domain were used that containedextra sequences encoding the linkers at their 5′ ends. The resultantproducts were cloned in expression vector pT7HTb or pET43.1b, both ofwhich have an N-terminal 6×His tag that can be removed by enterokinasedigestion. To produce recombinant ASRE, the expression constructs wereintroduced into BL21(DE3) E coli cells and grown overnight in liquid LBmedium with appropriate antibiotics. The saturated culture was freshlyinoculated in LB medium and protein expression was induced with 0.3 mMIPTG at an OD₆₀₀ of 0.4-0.6. Cells were grown at ambient temperaturewith shaking speed not exceeding 170 rpm overnight. The cells were thenharvested and disrupted by sonication in lysis buffer (20 mM sodiumphosphate pH 7.4, 500 mM NaCl, 20 mM imidazole, 1 mM 2-mercaptoethanol,1 mM PMSF). Lysates were loaded onto a Ni-NTA column (His-Gravy Trap, GEHealth Care) and washed with the same buffer. Elution was obtained witha linear step gradient of imidazole from 50-300 mM and protein puritywas assessed by SDS-PAGE. Imidazole was removed using a Millipore Amiconultra centrifugal concentrator. The proteins were further treated withTEV (Promega) or enterokinase (NEB) to remove the His-tag or Nus-tag asper manufacturer's instructions and the fragments containing His-tagwere removed by Ni-NTA. Proteins were further concentrated in 20 mMHEPES pH 7.0, 150 mM NaCl, 1 mM DTT and stored at −20° C. in aliquotssupplemented with 50% glycerol. Preparations were stable over 2-3 monthsat −20° C.

Design of Linker Using Linker Database

ASRE peptide linkers were designed according to amino acid propensity ofknown protein inter-domain linkers distinct from intra-domain loops.Residues Pro, Gly, Asp, Asn, His, Ser and Thr are preferred inintra-domain loop regions and, thus, were avoided. Thus, in someembodiments, a peptide linker of this invention can exclude Pro, Gly,Asp, Asn, His, Ser and/or Thr in any combination. A proline residue ishighly likely in both loops and linkers; within loops, proline isusually involved in tight turns, whereas few proline turns are found inlinkers. Most residues in known linkers adopt an α-helical structure,although a significant fraction of non-helical residues have a coilstructure. Therefore linkers with slight helical propensity were chosen.The first two amino acids of the linkers came from the restriction siteSall used for construction of fusion proteins. A natural linker withknown 3D structure was chosen from the linker database (fumaratereductase flavoprotein subunit; PDB 1qlaB_1, VDTGNWF, SEQ ID NO:17). Thelast two aromatic amino acids in the 12 Å linker were changed to glycineand serine (VDTGNGS, SEQ ID NO:18), thus changing the helical propensityto EECHHCC (see Table 2).

Substrate Preparation and Reaction Conditions

DNA fragments from plasmid pGZ3 containing various single PUF domainbinding sites (25) were amplified with SmSubHF1/GUXbR1 primer pairs andcloned between HindIII and XbaI site of pcDNA3. The resulting plasmidDNA was linearized with overnight XbaI digestion and purified. Thelinearized DNA (0.2 μg) was used as template in a 50 μl in vitrotranscription reaction containing T7 RNA polymerase (NEB) supplementedwith 1 unit of Murine RNase inhibitor (NEB). The reactions were furthertreated with RNase free DNAse I (Promega) for 1 h at 37° C. and RNAproducts were isolated by ethanol precipitation.

All ASRE digestion reactions were performed in buffers containing 20 mMHEPES (pH 8.0), 150 mM NaCl, and 10% glycerol supplemented withrespective divalent metal ions. A typical reaction contained 2 μg RNAsubstrate and 0.2-0.5 μg purified ASRE. Reactions were carried out at37° C. for various times, and stopped by adding Ficoll-UREA RNA-loadingbuffer and heated at 70° C. The resulting products were separated on 10%UREA-PAGE gels, stained with ethidium bromide, scanned on Typhoon Trio⁺scanner, and quantified with Image-Quant software (GE Health Care).

Kinetic Analysis

To measure the kinetic parameters of ASREs, various amounts ofsubstrates (0.14 to 6 μM) were incubated with either 0.5 μg enzyme orbuffer only (as undigested control) for 5 min, and equal volumes ofdigested products and undigested controls were separated in adjacentlanes of a denaturing PAGE gel. After scanning each gel, the undigestedRNA bands were first quantified to ensure that their intensities weredirectly proportional to the RNA loaded (FIG. 16), which was achieved byadjusting the volume of samples loaded for each concentration pair (thusto avoid saturation). Then substrates consumed at each inputconcentration were calculated using the relative ratio of remainingsubstrate to undigested controls. The resulting data were plotted withSigmaPlot (SYSTAT) and fitted to a Michaelis-Menten model using theenzyme kinetics module of SigmaPlot.

Determination of the RNA Cleavage Site by DSS RACE

To determine the cleavage sites of ASRE, the 5′ and 3′ digestionproducts were extracted from denaturing PAGE gels, and RACE (rapidamplification of cDNA ends) was used to map both ends of the cleavageproducts. To map the 3′ end of the 5′ cleavage fragment, a poly-A tailwas first added to purified RNA by incubating 100-300 ng RNA with polyApolymerase (NEB) and 2 mM ATP for 30 min following manufacturer'sinstructions (FIG. 17, Panel A). An average of 10-30 nt was added at theend of the RNA. Products were subsequently used in the RT reaction witha poly-T containing primer (GUHIF1, Table 4), and the complementary DNAwas used for nested PCR. The first PCR step involved SmSubHF1 and ananchor primer (GUHIF2), and the PCR product was again amplified using anested primer (3′RACE BamF1) downstream of the transcription start site.The resulting product was cloned between the BamHI and HindIII sites inpcDNA3 and sequenced (FIG. 17, Panel A).

To map the 5′ end of the 3′ cleavage fragment (FIG. 17, Panel B), thegel extracted RNA (-75 nt) was reverse transcribed using a known reverseprimer GuXbR1 Table 4). The complementary DNA product was furtherpurified with ethanol precipitation, and elongated with terminaltransferase (3 units) in a 50 μl reaction supplemented with dATP for 30min at 37° C. according to the manufacturer's instructions. The reactionwas stopped by heating at 80° C. for 20 min and further diluted in EBbuffer (Qiagen) to 0.5 ml. Second strand synthesis was carried out usinga poly-T containing primer (GUHIF1) and Klenow fragment (NEB).

The resulting product was further PCR amplified using a 3′ nested primer(GUnestedXbR1) and 5′ anchor primer (GUHIF2) harboring XbaI and HindIIIsite. The resulting product was cloned in pcDNA3 and sequenced (FIG. 17,Panel B).

In Vivo Activity of ASRE

E coli BL21(DE3) was transformed with either empty vector pET43.1b orvector encoding ASRE(lacZ). E. coli was also transformed withASRE(87621) to serve as non-specific control and with ASRE(lacZ) havinga D1353A mutation as an inactive control. Multiple colonies wereselected in each sample to circumvent clonal variation. Each clone wasgrown in liquid LB medium overnight, and the saturated culture wasfreshly inoculated (diluted 1:1000) in LB with an appropriate antibioticand grown up to OD₆₀₀ of 0.25-0.3 for induction. Protein expression wasinduced with 0.3 mM IPTG, and 5 mM Mn²⁺ was also added at this point.Bacterial cells were harvested 6 h after induction at 37° C. for themeasurement of β-galactosidase activity. Two aliquots of each samplewere snap frozen in an alcohol-dry ice mixture and stored at −80° C. forfurther analyses of RNA and protein levels.

To measure the β-galactosidase activity, the cells were lysed with tworounds of sonication in cold buffer (100 mM KPO₄ buffer pH 7.4, 2 mM2-mercaptoethanol and 1 mM PMSF). Total protein concentrations of eachsample were measured using the Bradford method (Pierce) to ensure thatequal amounts of total protein were used in the activity assay. Standardβ-galactosidase activity assays were performed for all samples in a 96well plate format as described previously (27). Briefly, a total of 1-2μg total protein was added in 300 ul assay reaction mixtures andincubated at 37° C. for 30 min, whereupon reactions were quenched byaddition of Na₂CO₃ to 1 M. The amount of ONPG(ortho-Nitrophenyl-β-galactoside) hydrolysis was determinedspectrophotometrically in a POLARstar Omega plate reader (BMG Labtech,GmBH, Germany), and activity units were calculated according to theformula: Miller units=1000×[(A₄₂₀−(1.75×A₅₅₀)]/(T×V), where T is thetime of the reaction before quenching with 1M Na₂CO₃ and V=volume ofculture used in the assay. The units were normalized to total proteinamount.

To measure RNA levels, frozen cells were thawed on ice and resuspendedin 500 μl TE buffer (pH 7.9) with 1 mg/ml of lysozyme, and incubated atroom temperature for 15 min. Two rounds of sonication were then used tolyse the cell membrane efficiently. Cell debris was pelleted by 5 mincentrifugation at 15000 rpm in a microcentrifuge, and the supernatantswere used for RNA purification with a Qiagen RNeasy mini kit. RNA elutedfrom the columns was treated with 2U of DNase I at 37° C. for 30 min(Promega), followed by heat inactivation of DNase. For real time RT-PCRanalyses, equal amounts of total RNA (300-500ng) were used to make firststrand cDNA by random priming (High capacity cDNA reverse transcriptionkit form Applied Biosystem cat. no. 4368814). Q-PCR was performed usinglacZ specific primers (Table 4) and a SYBR green kit (AppliedBiosystems), and the results were calibrated to the expression level offtsZ using gene specific primers (FtsZF1, FtsZR1 in Table 4). The dataanalysis was performed using ABI-prism q-RT PCR software.

To determine the β-galactosidase protein level by immunoblotting, totalbacterial protein was extracted by sonication in 100 mM KPO₄ buffer pH7.4, 2 mM 2-mercaptoethanol and 1 mM PMSF. Protein concentrations weremeasured by Bradford assay (Pierce) and equal amounts of total proteinwere loaded in each well of SDS-PAGE gels. The blots were probed using amonoclonal β-galactosidase antibody (1:1000; Santa Cruz cat. no.sc-56394) and detected using ECL-Western Blot reagent (GE Heath Care).Blots were stripped using Restore Plus western blot stripping buffer(Pierce) and re-probed with anti-GroEL antibody (1:2000; Sigma cat. no.G6532) as loading control.

Design Principle of Artificial Sequence-Specific RNA Endonucleases(ASREs)

To engineer ASREs, a modular design was adopted by combining a targetrecognition domain and a catalytic domain. For the target recognitiondomain, the unique RNA recognition domain of PUF proteins (named forDrosophila Pumilio and C. elegans fem-3 binding factor)¹¹ was chosen.Although most sequence-specific RNA-binding proteins recognize theirtargets through RRM or K homology (KH) domains that bind to short RNAelements with moderate affinities, it is impractical to engineer an RNArecognition module using these domains due to their weak RNA bindingaffinity and the absence of a predictive RNA recognition code¹². On theother hand, the RNA-binding domain of human Pumilio1 (PUF domain)contains eight repeats that recognize eight consecutive RNA bases, witheach repeat recognizing a single base¹³ (FIG. 1, Panel A, left).Moreover, two amino acid side chains in each PUF repeat recognize theWatson-Crick edge of the corresponding base and determine thespecificity of that repeat¹³′¹⁴. Using this recognition code, a PUFdomain can be designed to specifically bind most 8-mer RNAsequences^(13,14).

For the RNA cleavage module, a small endonuclease domain with limitedexonuclease activity was used. One good candidate is the PIN domain(PilT N-terminus) of SMG6, a key factor involved in nonsense-mediateddecay (NMD). This domain has well-defined molecular architecture andrequires only a divalent metal cation for sequence-independent RNAcleavage (FIG. 1, Panel A, right).

Joining the target recognition module and the RNA cleavage module by ashort peptide linker flexible enough to give both domains easy access toRNA but rigid enough to prevent non-specific RNA cleavage was the nextobjective. To define the linker sequence and length, a rational designbased on the amino acid propensity model of natural linkers inmulti-domain proteins¹⁵ was initiated. In addition, aromatic amino acidswere excluded due to their potential to interact with base pairs bystacking, and proline was excluded because it can potentially form turnsor cis-trans isomers that negatively affect domain independence. Basedon these criteria, a hepta peptide linker (VDTGNGS; 12 Å, SEQ ID NO:18)was selected initially.

Using the selected target recognition domain, catalytic domain, andinitial linker sequence, the architecture of this design was firsttested by placing the PUF domain at the N-terminus and the PIN domain atthe C-terminus and vice versa. An ASRE was constructed by fusing anN-terminal modified PUF(6-2/7-2) containing mutations N1043S/Q1047E inrepeat 6 and S1079N/E1083Q in repeat 7 to specifically bind UugAUAUA(7u6g)¹⁴ to a C-terminal wild type PIN or a mutated PIN (D1353A mutationin PIN active site). The recombinant ASREs (PUF-PIN fusion proteins)were expressed in E. coli and purified to homogeneity (FIG. 13, Panel A,lanes 2 and 3). As shown in FIG. 1, Panel B, incubation of theASRE(6-2/7-2) with an in vitro transcribed RNA substrate (191-nt inlength) containing a single 8-nt recognition site 7u6g led to the rapidcleavage of the substrate into two fragments (lane 2), whereas themutated ASRE(6-2/7-2) containing an inactive PIN domain had nodetectable activity (lane 3). These observations indicated that thecleavage was catalyzed by the PIN domain rather than contaminatingnucleases. An inverted ASRE that contains N-terminal PIN and C-terminalPUF domains was also expressed and purified. However, when incubatedwith the same substrate under identical conditions, this enzyme causednonspecific cleavage of RNA, resulting in total digestion of the entireRNA substrate (FIG. 1, Panel C). Therefore, the N-terminalPUF-C-terminal PIN orientation was chosen for the rest of this study.

Effect of ASRE Linker Length on its Activity

Studies were carried out to determine how different linkers affected theactivities of ASRE. A suitable linker is important for the highcatalytic activity of ASRE. A glycine or Gly/Ser rich linker may be tooflexible and unstable and, thus, could act as an energetic, structural,or activity-interfering nuisance, especially when it is longer thannecessary to connect two domains¹⁶. On the other hand, a short linkermay generate a structural barrier that prevents simultaneous contact ofthe two domains with an RNA. A database of known linkers revealed thatboth short (less than 6 amino acids) and long linkers (more than 14amino acids) are very rare in natural linkers and average linker lengthvaries from 8-10 amino acids¹⁵. To optimize the ASRE linker length, atri peptide (VDT; 7.3 Å), hepta peptide (VDTGNGS; 12 Å, SEQ ID NO:18)and do-deca peptide (VDRRMARDGLVH; 20.5A, SEQ ID NO:19) linker wasdesigned and each was inserted between PUF(6-2/7-2) and wild type PIN.These peptides have mixed helical propensity and should provide limitedflexibility to prevent non-native interactions between domains.

As shown in FIG. 2, Panel A, the purified ASRE with the tri-peptidelinker had very low activity compared to the other two enzymes (lane 2).The enzymes had considerably higher activities with linker lengths of 7aa or 12 aa (lanes 3 and 4). However, non-specific cleavage productswere observed at longer incubation time with the 12-aa linker (FIG. 2,Panel A, lane 5). Thus the ASRE with hepta peptide linker was used infurther studies described herein. Furthermore, the ASRE containing theheptapeptide linker completed RNA cleavage within two hours (FIG. 2,Panel B), and displayed strict pH selectivity with reduced reaction ratebelow pH 7.5 and no detectable cleavage at pH 6.0.

Sequence Specificity and the Ion Requirement of ASREs

To confirm that the engineered ASREs mediate sequence specific RNAcleavage, an ASRE containing the wild type PUF domain that recognizes adifferent 8-nt target, the nanos response element (NRE: UGUAUAUA), wascreated. The NRE differs from the PUF(6-2/7-2) target by twonucleotides¹⁴. As shown in FIG. 2, Panel C, this ASRE(wt) cleaved onlythe substrate containing its cognate target NRE (lane 3) but not theclosely related RNA 7u6g (lane 2). Conversely, the ASRE(6-2/7-2)specifically cleaved its cognate target (7u6g) but not the substratecontaining NRE (FIG. 2, Panel C, lanes 6 and 5, respectively). Inaddition, the ASRE(6-2/7-2) failed to cleave other closely related RNAs,including UGUAUgUA (3g), UGUgUgUg (5g3g1g) and gugAUAag (8g7u6g2a1g, or87621 in short) that vary by 3 or 6 nucleotides from the ASRE(6-2/7-2)substrate 7u6g (FIG. 14). These data indicated that the activities ofASREs are highly sequence specific.

The active site of the PIN domain is lined by three conserved aspartateresidues (D1251, D1353, D1392) that coordinate one divalent metal cationto activate a water molecule for nucleophilic attack of the 3′-5′phosphodiester bonds¹⁷′¹⁸. To determine the metal ion preference ofASRE, the reaction was carried out in the presence of different divalentmetal ions including Mn²⁺, Co²⁺, Ca²⁺and Mg²⁺. Consistent with the metalion selectivity of the wild-type PIN domain¹⁷, optimal activity of ASREwas detected in the presence of Mn²⁺ and suboptimal activity wasdetected in the presence of Mg²⁺¹⁷. ASRE also had limited activity inpresence of Co²⁺ (FIG. 2, Panel D, lane 5), suggesting that the PINdomain may be able to use Co²⁺ as a low activity substitute. Inaddition, an increase in the concentration of Mn²⁺ led to higher ASRERNA cleavage activity, as judged by the apparent rate constantscalculated with a pseudo-first order reaction model (FIG. 2, Panel E).Enzyme kinetics of ASREs

The reactions catalyzed by ASREs in these assay conditions followedMichaelis-Menten-like kinetics. As shown in FIG. 3, Panels A and B, theinitial cleavage rates in 5-minute reactions using differentconcentrations of cognate substrates were best fitted to a

Michaelis-Menten model (a representative gel is shown in FIG. 15).Furthermore, the kinetic parameters of four related ASREs (Table 1)indicate that ASRE-catalyzed cleavage was fairly efficient(k_(cat)/K_(m) in the 10⁷ M⁻¹ min⁻¹ range; this number is likely anunderestimation as 100% of enzymes were assumed active). Most of theASREs showed very little to no activity with non-cognate RNA substrates,with the exception of ASRE(671) toward non-cognate 7u6g RNA (UugAUAUA).This non-cognate activity, however, displayed more than a five-folddecrease in V_(max) compared to ASRE(671)'s cognate substrate(UugAUAUg), which has a single base difference at the end of the 8-nttarget. This non-cognate activity may be explained by the fact that theevolutionary plasticity of PUF domain allows recognition of suboptimaltarget sequences.

In addition, consistent with the inability of Pumilio1 to bind DNA¹³, itwas found that the presence of excess single-stranded DNA in thereaction mixture did not interfere with RNA cleavage (FIG. 16). Thesedata further validate the specificity of ASREs as sequence-specific RNAenzymes.

Determining Cleavage Site of ASREs

These data indicate that ASRE-mediated RNA cleavage occurs near thecognate binding site, generating two products whose lengths roughly addup to the length of input RNA. To determine the exact site of ASREcleavage, both the 5′ and 3′ digestion products (FIG. 17, Panels A andB) of ASRE were purified and cloned using 5′ and 3′ DSS-RACE (digestionsize selection and rapid amplification of cDNA ends). The 5′ and 3′digestion products were purified from the same gel, amplified by RACE,cloned into a plasmid vector and sequenced. Forty clones in total weresequenced to determine the 5′ and 3′ sites, and only sites that weremapped by multiple clones were considered, since singly identified sitescould be an artifact of the amplification process. Due to the limitedterminal transferase (TdT) activity of the reverse transcriptase¹⁹, anextra residue (mostly a G or A) was added in many of the 5′ RACEproducts (FIG. 18).

As shown in FIGS. 3, Panel C and 18, two cleavage sites were identifiedfrom both 5′ and 3′ RACE which are labeled as sites 1 and 2 according totheir positions in substrate. The two cleavage products can be matchedto the same site, suggesting that an ASRE makes only a single, distinctcleavage rather than several random cleavages. ‘The digested RNAproducts can be cloned with the DSS-RACE protocol (FIG. 17), indicatingthat RNA cleavage catalyzed by ASRE generates a 5′ fragment with 3′hydroxyl group and a 3′ fragment with 5′ phosphate.

The major digestion site (site 2) lies 4 bases downstream from the PUFbinding site and accounts for ˜80% of digestion products, whereas theminor cleavage site lies in the third position of the PUF binding 8-nt.Since PUF binds RNA in an anti-parallel fashion with the first repeat atN-terminus recognizing the 8^(th) position of RNA and the PIN domain isat the C-terminus of the ASRE, such a cleavage pattern is somewhatsurprising. This result indicated 2Q that ASREs might form a foldbackstructure with the RNA substrate being bound by the one arm (PUF domain)and the PIN catalytic domain folded back to cleave the phosphodiesterbackbone at downstream sites (FIG. 3, Panel D). However, the fact thatRNA can also form secondary and tertiary structure in solution addscomplexity to this model. Minor cleavage sites were also determined nearthe PUF binding sequence that were mapped by single clones; such sitescould be accounted for by experimental artifact of RNA amplificationwith RACE, by incomplete ASRE digestion, or by flexibility of theASRE-RNA complex. Silencing gene expression with ASRE in living cells

The in vitro sequence-specific RNA cleavage activity of ASREs raised thepossibility that these enzymes could specifically cleave a target RNA,e.g., mRNA, in cells, thereby silencing gene expression. Such anapplication would be especially useful in organisms where interferingRNA (RNAi)machinery does not exist. As a proof of concept, the bacteriallacZ transcript was targeted. Bacteria were selected because: (1)bacteria do not express endogenous Pumilio-like protein homologues, andthus are cleaner systems to start with, (2) lacZ gene expression andregulation is very well understood in E. coli, and (3) ASREs areexpressed as active forms in bacterial cells.

To target the lacZ transcript, a modified ASRE, ASRE(LacZ) (orASRE(6g3g2a) in the nomenclature used herein), was engineered with amutant PUF domain (repeats 2,3,6) that specifically recognizes thetarget sequence UGGAUGAA, which occurs twice (position 1232-1239 and1520-1527) in the lacZ mRNA. BL21(DE3) cells, in which expression of theLacZ is under the control of IPTG, were transformed with the expressionvectors for either ASRE(LacZ) or control ASREs. As shown in FIG. 4,Panel A, after IPTG induction, the clones expressing ASRE(LacZ) hadsignificantly decreased β-galactosidase activity compared to emptyvector controls or non-specific ASRE controls. Consistent with the invitro results (FIG. 1, Panel B), the D1353A mutation in the PIN domainof ASRE(LacZ) significantly relieved the gene silencing effect ofASRE(LacZ), suggesting that the decrease in β-galactosidase activity wasmost likely due to mRNA degradation rather than translationalinhibition.

To investigate whether the ASRE decreases the β-galactosidase activityby cleaving mRNA, the steady state level of LacZ mRNA was measured byreal-time RT-PCR. As shown in FIG. 4, Panel B, the expression ofASRE(LacZ) resulted in a significant decrease in LacZ mRNA compared tostrains containing an empty vector or control ASRE. In line with themRNA levels and the (3-galactosidase activity assays, the protein levelsof β-galactosidase were significantly decreased in clones expressingASRE(LacZ) as judged by western blots (FIG. 19). This effect is not dueto differential expression of ASREs (FIG. 19, bottom panel) as the ASRElevels in all clones were roughly equal.

The discovery of type II DNA restriction enzymes 40 years ago marked thebirth of the “recombinant DNA” era of modern biology^(20,21). Inaddition to native DNA restriction enzymes, artificial enzymes (akazinc-finger nucleases, ZFNs) that combine a zinc finger DNA-bindingdomain and a DNA-cleavage domain²² have also been designed to targetunique sequences within complex genomes²³. However, restriction enzymesof RNA have not been discovered in nature and the creation of artificialenzymes resembling ZFNs has been proven difficult, primarily due tolimited understanding of an RNA recognition code between RNA andprotein¹². In the present invention, an RNA “restriction enzyme” (ASRE)was created. The generation of this novel enzyme and othersequence-specific derivatives enables efficient and specific cleavage ofdiverse RNA targets both in vitro and in vivo.

The keys for successful generation of ASREs include the choice of asequence-specific RNA binding domain, a suitable endonuclease domain, anoptimal linker, and a correct orientation of these elements. These dataindicate that the unique RNA recognition mode of the PUF domain rendersASREs the ability to specifically recognize a diverse panel of RNAtargets without detectable cross activity between non-cognate ASRE/RNApairs. These data also demonstrate that the PIN domain of human SMG6cleaves RNA at specific sites only when fused to the C-terminus of thePUF domain, not vice versa (FIG. 1, Panel C). One possible explanationis that the PIN in PUF-PIN orientation can “fold back” so that theactive site faces PUF domain to specifically cleave RNA (FIG. 3, Panel Dand FIG. 20, Panel A), whereas in PIN-PUF orientation, the PIN activesite faces away from PUF to cleave any nearby RNA (FIG. 20, Panel B).Alternatively, the C-terminal residues of PIN could be very flexiblewhen fused with the peptide linker¹⁷, probably allowing non-specific RNAsequences to become accessible to the PIN active site when ASRE isconstructed in a PIN-PUF orientation (FIG. 20, Panel C). Medium-sizedlinkers (7-12 amino acids) rich in slightly polar hydrophilic amino acidare best suited for the design. In addition, linkers with helical orhelix-coil-helix structures produced the most active ASREs, whereaslinkers with helix-turn-helix or 3-10 helix structures reduced theactivity of ASREs. Furthermore, the linker length also affects thespecificity of the enzyme: non-specific cleavage became apparent withlinkers approaching ˜20 Å in length, probably due to excess flexibilitythat allowed the PIN endonuclease domain to recognize and cleave anyRNA.

Several lines of evidence confirm that the RNA cleavage reaction ofASREs is indeed catalyzed by the PIN domain of SMG6: (1) ASRE has thesame cation preference as that of PIN, both requiring Mn²⁺ for maximumactivity¹⁷; (2) a mutation in the active site of the PIN domainabolished ASRE activity; and (3) non-specific RNA cleavage was observedwhen the PIN domain had too much flexibility to recognize any RNA(either with a long linker or a reverse PIN-PUF orientation). Thereaction catalyzed by ASREs was fairly efficient as judged by thek_(cat)/K_(m) value (>10⁷ min⁻M⁻¹. The Michaelis constants (K_(m)) ofthe ASREs were significantly higher than the dissociation constant of atypical PUF with its cognate sequence¹⁴, suggesting that the ASRE K_(m)is determined mainly by the interaction of PIN with RNA and thecatalytic rate of PIN. In fact, a tight RNA-protein interaction may leadto slow turnover of the enzyme and affect its reaction rate. Therefore,the activity of ASREs may be improved by using a catalytic domain moreefficient than PIN domain.

For ASRE mediated mRNA degradation in E. coli cells, the expression ofboth ASRE, driven by the T7 promoter, and chromosomal lacZ, driven bylacZ promoter/operator, was induced simultaneously with IPTG. Since T7RNA polymerase is almost 8 times faster than E coli RNA polymerase ²⁴,ASRE should have been synthesized faster than lacZ, providing a timewindow for the ASRE to cleave lacZ mRNA. Due to the absence of nuclearmembrane, E. coli mRNAs undergo co-transcriptional translation,resulting in bacterial mRNA being constantly bound and protected byribosomes. The observation that ASRE(lacZ) could specifically silencelacZ mRNA indicates that the ASRE can recognize its target in vivo withsufficiently high efficiency and affinity to overcome the protectiveeffect of ribosomes. As an added benefit, the co-transcriptional bindingof mRNA by ribosomes in bacteria may limit low affinity off-targeteffects of ASREs, as only the specific ASRE with high binding affinitywill compete efficiently with ribosomes. The silencing effect was mostlikely due to mRNA degradation rather than a translational block, as anASRE with a mutation in the PIN domain active site did not affect theβ-galactosidase activity. In addition, mRNA degradation was alsoconfirmed by q-RT-PCR analysis. Using a similar strategy, other geneproduct(s) may also be targeted in organisms where RNAi does not exist.

The present invention also includes various ways to optimize and expandASRE usage. First, although the PUF domain can specifically recognize an8-nt sequence, it may be desirable to create ASREs that recognize asubstrate having a target sequence of different length. For example, thein vivo use of ASRE to silence gene expression would presumably benefitfrom a longer ASRE binding site that can minimize off-target effects,whereas the in vitro use of ASRE to probe RNA structure or sequence mayrequire a shorter targeting site to produce multiple cleavages in asingle substrate. It should be possible to increase the length ofrecognition site by adding more PUF repeats, or by using multiple PUFsin a single fusion protein. Conversely, it may be possible to decreasethe length of the recognition site by relaxing the specificity of somePUF repeats (i.e., making some repeats to bind all four bases equallywell). In addition, the catalytic activity of ASRE may also be improvedby using riboendonuclease domains other than PIN, or by optimizing thePIN domain active site to increase activity. Finally, the relativepositions of the cleavage site may be affected by the conformation ofthe linker region; thus, testing peptide linkers with differentsequences and structures will likely reduce the “star activity” of ASREsso that cleavage occurs at only a single, predictable site.

References for Example 1

-   1. O. Takeuchi and S. Akira, Immunol Rev 227 (1), 75 (2009).-   2. D. P. Bartel, Cell 136 (2), 215 (2009).-   3. S. J. Baker, J. L. Morris, and I. L. Gibbins, Brain Res Mol Brain    Res 111 (1-2), 136 (2003).-   4. I. J. MacRae and J. A. Doudna, Curr Opin Struct Biol 17 (1), 138    (2007).-   5. J. J. Champoux and S. J. Schultz, FEBS J 276 (6), 1506 (2009).-   6. W. G. Scott, Curr Opin Struct Biol 17 (3), 280 (2007).-   7. H. Yoshida, Methods Enzymol 341, 28 (2001).-   8. Y. Tomari and P. D. Zamore, Genes Dev 19 (5), 517 (2005).-   9. T. W. Nilsen, Bioessays 25 (12), 1147 (2003).-   10. S. K. Silverman, Nucleic Acids Res 33 (19), 6151 (2005).

11. M. Wickens, D. S. Bernstein, J. Kimble et al., Trends Genet 18 (3),150 (2002).

-   12. S. D. Auweter, F. C. Oberstrass, and F. H. Allain, Nucleic Acids    Res 34 (17), 4943 (2006).-   13. X. Wang, J. McLachlan, P. D. Zamore et al., Cell 110 (4), 501    (2002).-   14. C. G. Cheong and T. M. Hall, Proc Natl Acad Sci USA 103 (37),    13635 (2006).-   15. R. A. George and J. Heringa, Protein Eng 15 (11), 871 (2002).-   16. P. Argos, J Mol Biol 211 (4), 943 (1990).-   17. F. Glavan, I. Behm-Ansmant, E. Izaurralde et al., EMBO J 25    (21), 5117 (2006).-   18. E. Huntzinger, I. Kashima, M. Fauser et al., RNA 14 (12), 2609    (2008).-   19. D. Chen and J. T. Patton, Biotechniques 30 (3), 574 (2001).

20. H. O. Smith and K. W. Wilcox, J Mol Biol 51 (2), 379 (1970).

-   21. T. J. Kelly, Jr. and H. O. Smith, J Mol Biol 51 (2), 393 (1970).-   22. T. Cathomen and J. K. Joung, Mol Ther 16 (7), 1200 (2008).-   23. V. K. Shukla, Y. Doyon, J. C. Miller et al., Nature 459 (7245),    437 (2009).-   24. I. lost and M. Dreyfus, EMBO J 14 (13), 3252 (1995).-   25. Y. Wang, C. G. Cheong, T. M. Hall et al., Nat Methods 6 (11),    825 (2009).-   26. J. Tilsner, O. Linnik, N. M. Christensen et al., Plant J 57 (4),    758 (2009).-   27. K. L. Griffith and R. E. Wolf, Jr., Biochem Biophys Res Commun    290 (1), 397 (2002).

Example 2

In addition to being the organelle in energy production for eukaryoticcells, the mitochondrion plays a critical role in myriad cellularprocesses such as control of apoptosis and ROS (reactive oxygen species)signaling. Thus mitochondria dysfunction is linked to various diseasessuch as cancer, autism and age-associated neurodegenerative diseases.Most mitochondrial proteins are coded by nuclear DNA and imported intomitochondria after translation. Mitochondria have a distinct genome, andthe human mitochondrial genome contains 13 protein-coding, 2 rRNA, and22 tRNA genes (FIG. 21, Panel A). Although mitochondrial gene mutationsare closely linked to various diseases, the functions of these genes arehard to study because there are limited research tools to manipulatemitochondrial gene expression. A new ASRE with a mitochondrial targetingsignal has been generated that can be used to specifically silencemitochondrial gene expression by cleaving mitochondrial RNA, making itpossible to determine the function of each gene in the mitochondrialgenome.

An 8-nt target sequence (TTTATGTG) in subunit 5 of the respiratorycomplex I (mtND5) gene was selected for these studies (FIG. 21, PanelA). The sequence is a unique hit in the 16.5 kb mitochondrial genomethus ensuring minimal off target effect. To facilitate the translocationof ASRE into mitochondria, the N-terminal mitochondrial targetingpeptides from ornithine transcarbamylase (OTC) enzyme were used, whichare cleaved by mitochondrial protease after protein translocation (FIG.21, Panel B). The resulting enzyme, mitochondrial ASRE (mitoASRE), istranslocated into mitochondia by natural cellular machinery andspecifically induce gene silencing. As a control, ASRE(ND5) wasengineered, which lacks mitochondrial targeting signals, as well as amutated ASRE(ND5) that contains a mutation in the active site of the PINdomain. To confirm the design, the initial study was to transientlytransfect HELA and HEK293 cells with N-terminal flag tagged ASREs. Itwas found that the mitoASRE and mutated ASRE(ND5) could be successfullytranslocated to the mitochondrial matrix, and the control ASRE(ND5) thatlacks mitochondrial targeting sequence is mainly located in thecytoplasm. Real time PCR showed 30% decrease of mtND5 transcriptcompared to control or cells expressing inactive DA control. Aconcomitant decrease in protein level was also noticed as depicted bywestern blot analysis. Cells also showed a slow growth phenotype onexpression of mtND5 ASRE.

Because human cells typically contain from about 10 to several hundredcopies of mitochondria and the transient transfection of expressionvector has different efficiency among various cell types, mitochondrialgene inhibition by ASRE is better achieved through stable expression ofASRE. To homogeneously express equal amounts of ASRE in cells,tetracycline inducible stable cells lines were generated using the Flpinsystem. After 24 hours of tetracycline induction, robust expression ofmitoASRE(ND5) was observed in mitochondria (FIG. 21, Panel C). Thetranscript level of mtND5 gene has decreased by 70% as determined byreal time RT-PCR, compared to un-induced cells (FIG. 21, Panel D). As acontrol, cells expressing DA mutant mitoASRE(ND5) showed slightlyincreased amounts of mtND5 mRNA, likely because the inactivemitoASRE(ND5) can protect its target from natural mRNA turnover whenbinding to the target. Examination of ND5 protein level by western blotalso showed that the mtND5 protein was decreased in cells expressingmitoASRE(ND5) but not mutated mitoASRE(ND5) (FIG. 21, Panel E).Induction of mitoASRE(ND5) reduced growth rate of cells.

Because mutation of mitochondrial genes is closely associated with manyhuman diseases, the engineering of mitochondrial ASRE to specificallymanipulate expression of mitochondria encoded genes has opened new doorsto assess their role in such mitochondria associated diseases. Moreover,the capability to selectively probe and perturb this organelle within aliving cell provides a novel means to examine mitochondrial functionthat is relevant to biology and physiology. This was not possible withconventional gene silencing tools like RNA interference (RNAi). To theinventors' knowledge, this is the first report to selectively probeorganelle specific gene knock down. The fact that ASRE can be targetedto any compartment of a cell makes it a useful tool that iscomplementary to conventional gene silencing methods using RNAi.

A 32 aa mitochondrial targeting signal from ornithine transcarbamylaseleader peptide (MLFNLRILLNNAAFRNGHNFMVRNFRCGQPLQ, SEQ ID NO:71) was usedas a mitochondrial targeting signal for ASRE (FIG. 21, Panel B).Tetracycline inducible stable lines were produced that expressedmitoASRE(ND5) or a catalytically inactive version cloned between HindIIIand NotI sites in a pCDNA5/FRT/TO construct. This construct wasco-transfected with pOG44 plasmid to facilitate integration into an FRTlocus of Flp-In T-Rex 293 cells (Cat no. K6500-01, Invitrogen). Stablyintegrated cells were selected using DMEM medium with 10% FBS and 100ug/ml of hygromycin B. Cells were then passed for 8 generations on 60 mmdishes for stability and creation of isogenic lines. To induce mitoASREexpression, cells were re-plated in fresh DMEM medium with 10% FBSwithout any drug and induced with tetracycline (6-10 ug/ml). Twenty-fourhour post induction expression of ASRE was checked on western blot usinganti-flag antibody (M2 F1804, Sigma). Growth curves and viability assayswere performed at different time points post induction. Briefly, cellswere dislodged with trypsin after different growth times, mixed with0.4% trypan blue (1:1) and counted. The proliferation assays wereperformed using a WST-1 assay kit (Roche, Cat. no. 11 644 807 001).

To purify RNA, the mitochondria were lysed in lysis buffer by sonicationfor 15 mins and then total RNA was purified using a PureLink RNA minikit (Ambion Cat. no. 12183-018A). First strand cDNA synthesis wasperformed using a high capacity reverse transcription kit (AppliedBiosystem Cat. no. 4368814). RNA levels were measured by real timequantitative PCR using QPCR SYBR Green Low Rox mix (Thermo Scientific,Cat. no. AB-4322/A) using gene specific primers and normalized againstGAPDH gene expression. For western blot analysis, cells were lysed inRIPA buffer and further sonicated for 15 sec on ice. Protein wasmeasured using Bradford reagent (Pierce). A total of 30-40 ug proteinwas loaded for each sample and blotted. Western blot was performed usinganti human rabbit ND5 specific antibody (1:100 Ab Cam, Cat. No. ab92624)overnight in cold and detected using HRP linked secondary antibody(1:5000 Cell Signaling, Cat. no. 7074). The total protein was normalizedagainst α-tubulin (Ab Cam, Cat. no. ab40742) which serves as loadingcontrol. ASRE expression was confirmed by anti flag mouse antibody.

Example 3

Myotonic dystrophy (dystrophia myotonica, DM) is the most common form ofmuscular dystrophy in adults that affects 1 in 8500 individualsworldwide. The genetic mutations responsible for DM were identified asthe expanding (CUG)_(n) repeats in the 3′ UTR of DMPK mRNA (for DM1) orthe (CCUG)_(n) expansion in the intron of ZNF9 (for DM2). Suchnon-coding RNA repeats bind and sequester muscleblind proteins orincrease the level of the CUG binding proteins that regulate alternativesplicing of multiple endogenous genes critical to muscle and heartfunctions (1). Currently there is no cure for DM, although complicationsof the disease can be treated and alleviated.

The present study provides a novel approach to target and cleave thetoxic RNA repeat with the artificial site-specific RNA nucleases (ASREs)as described herein. Such enzymes were constructed with an RNA bindingmodule (PUF domain) that is specifically designed to recognize any 8-ntsequence and an endoribonuclease domain (PIN domain of SMG6) thatefficiently cleave RNA. ASREs that can specifically bind and cleaveexpanding RNA repeats in the cell nucleus where the toxic RNAs areaccumulated are described herein. The present study focuses on DM1,which is the most common and severe form of DM, while it is understoodthat similar strategies can be developed to treat DM2.

Engineering PUF domains to recognize the (CUG)_(n) repeats with highaffinity and specificity. The native PUF domain contains 8 PUF repeats,each specifically recognizing one RNA nucleotide through hydrogenbonding with the base A, U or G. In addition, the modular “binding code”of the PUF repeat to C nucleotide has been identified, using ayeast-3-hybrid screen, making it possible to engineer a PUF domain thatspecifically recognizes any 8-nt RNA. PUF domains will be designed andengineered that specifically recognize all three possible 8-nt sequencesin a (CUG)_(n) repeat. The affinity and specificity of RNA-proteininteraction will be determined with various assays to select for amodified PUF domain that binds a (CUG)_(n) repeat with high affinity.Such PUF domains will be used either as the RNA binding modules ofASREs, or as competitors of the toxic RNA repeats that sequesterendogenous RNA binding proteins in DM1 patients.

Cleavage of the toxic (CUG)_(n) repeat in DM1 cells with artificialsite-specific RNA endonucleases. PUF domains that specifically recognize(CUG)_(n), will be used to generate novel ASREs. A nuclear localizationsequence will also be included in the ASRE to direct the enzyme into thenucleus where the RNA repeats accumulate. The specific cleavage of(CUG)_(n), repeats will be analyzed in vitro and in cultured DM1 cells.Experiments will be done to determine if expression of ASRE can reducethe number of ribonuclear aggregates caused by (CUG)_(n), and to furtherdetermine if the normal expression level and localization ofmuscleblind-like 1 (MBNL1) and CUG-binding protein 1 (CUGBP1) can berestored by ASRE. ASREs will also be tested for the ability to reversethe mis-splicing of genes affected in patients, using cultured DM1 cells(such as CIC-1, cTnT and SERCA1). In addition, mRNA-seq will be used todetermine (i) if normal gene expression and alternative splicing patterncan be restored at a genomic scale by the designer ASRE in DM1 cells,and (ii) the off-target effect of ASRE treatment.

Determination of the in vivo efficacy of designer ASREs using DM1 mousemodel. Adeno-associated virus (AAV) vectors will be used as a genedelivery tool to express ASRE in a DM1 mouse model (HSA^(LR) mice), andto further test if ASRE can relief the DM1 phenotype in muscle andheart. The expression of ASRE will be analyzed in different muscletissues and studies will be carried out to determine if the nuclearMBNL1 sequestration can be released. The splicing and the function ofkey DM1 marker genes (such as CIC-1, cTnT and SERCA1) will be examinedto determine if they can be restored in muscles, and if the myopathyphenotype of DM1 mice can be reduced.

Unique RNA pathogenesis of myotonic dystrophy makes it difficult todevelop targeted therapy. Myotonic dystrophy (DM) is an autosomaldominant disease with multisystemic symptoms including myotonia, musclewasting, cardiac conduction defects, insulin resistance, cataracts andcognitive dysfunction (1-3). Two forms of DM are caused bymicrosatellite expansions in different genes. The more severe form, DM1,is caused by (CTG)_(n) expansion in the 3′ UTR of the dystrophiamyotonica-protein kinase gene (DMPK) (4,5); whereas DM2 is caused by(CCTG)_(n) expansion in the intron of Zinc finger protein 9 (ZNF9) (6).These non-coding mutations have a profound effect on the function ofmany genes in a trans-dominant fashion, suggesting that thegain-of-function of toxic RNA causes the clinical features (1,3).Consistent with the RNA pathogenesis model, the transgenic mouse with(CUG)₂₅₀ in an untranslated region of a different mRNA was sufficient togenerate DM1 phenotype (7).

In DM1 cells, (CUG)_(n), repeats specifically bind to splicingregulatory proteins, forming RNA-protein complexes that accumulatewithin the nucleus. Two classes of splicing factors are known to beaffected by (CUG)_(n) (FIG. 22; Ref. 1): (i) The members ofmuscleblind-like family (e.g., MBNL1) are sequestered in the nuclearfoci, resulting in nuclear depletion and loss of function (8-10); (ii)The CUG binding proteins (e.g., CUGBP1) are up-regulated in DM1 cellsthrough a PKC-mediated phosphorylation event that stabilizes the protein(11,12). Changes of these splicing factors cause splicing dysregulationin a large number of genes (13) including CIC-1 (14,15), insulinreceptor (16), cTnT (11) and SERCA1 (17,18). Generally the embryonicisoforms of these gene are mis-spliced in adult DM tissues, leading tomultisystemic defects in DM1 patients (19). Because the toxic RNArepeats affect two classes of splicing factors that sequentially causemis-splicing of ˜100 proteins (13), it is a complicated task to developspecific therapies targeting the molecular cause of DM1.

Several strategies have been used to develop specific therapy againstDM1. One approach uses gene therapy to restore a normal level of thesplicing factors affected by toxic RNA. For example, expression ofmuscleblind protein with adeno-associated virus (AAV) vector restorednormal adult-splicing patterns of several pre-mRNAs in muscles of DM1mice (20). Because AAV vectors can effectively deliver genes in muscleand heart (21) and the loss of MBNL1 is a primary pathogenic event inDM1, this method produced encouraging results in a mice model (20). Thestrength of such protein-based approaches is to use gene therapy tools(like AAV) to achieve efficient delivery. However, because multiplemuscleblind like proteins are sequestered and CUGBP1 is increased in DM1patients, conventional gene therapy approaches cannot restore the normallevels of all splicing factors affected in DM1.

Another approach is to use an antisense oligonucleotide (AON) basedmethod to cleave (CUG)_(n) repeats using RNAi (22), short antisenseoligos (23) or ribozymes (24). This strategy has produced promisingresults in cultured cells, but systematic delivery of AON in muscle andheart has been a major challenge. In addition, the existence of nuclearRNAi is still under debate as dsRNA usually induces transcriptional genesilencing rather than RNA degradation. Another AON based method is touse a morpholino oligonucleotide that blocks the (CUG)_(n) from bindingto proteins. This short AON (i.e., CAG25) led to efficient reversal ofRNA dominance and correction of many splicing defects in DM1 micecontaining (CUG)_(n) repeats (25). Given the encouraging results ofCAG25, the main challenge becomes the delivery of morpholino AON topatients. Although it was delivered to mice by intramuscular injectionfollowed by in vivo electroporation, this delivery route is notpractical for human use. Some progress has been made in identifyingsmall molecules that inhibit the binding of (CUG)_(n) to MBNL1 andrelease the sequestered MBNL1 (26-28). These small molecules are fairlytoxic as they all bind to structured RNAs. Additional work is needed tominimize the toxicity and increase their specificity to the (CUG)_(n)repeats.

Unique RNA binding mode of PUF proteins provides new hope to target(CUG)_(n) repeats. The present study employs a new therapeutic strategyusing an artificial site-specific RNA endonuclease (ASRE) tospecifically cleave (CUG)_(n) repeats. Engineering of such a novelenzyme takes advantage of the unique RNA recognition mode of PUFproteins, whose functions involve mediating mRNA stability andtranslation (29). The PUF domain of human pumilio 1 contains 8 repeatsthat bind consecutive bases in an anti-parallel fashion, with eachrepeat recognizing a single base (30) (FIG. 7). Each PUF repeat uses twoamino acids to recognize the edge of the corresponding base and a thirdamino acid (Tyr, His or Arg) to stack between adjacent bases, causing avery specific binding between a PUF domain and an RNA. By changing twoamino acids in each repeat, a PUF domain can be modified to bind most8-nt RNA (30,31). This unique binding mode makes PUF a programmableRNA-binding domain that can be used in various artificial factors forspecific splicing modulation or RNA detection (32,33). ASREs have beendesigned and engineered to specifically cleave

RNA with an 8-nt PUF target, and in the present study, ASREs will begenerated that recognize (CUG)_(n) with high affinity and efficientlycleave the RNA repeats.

Novel RNA restriction enzymes will be engineered to specifically cleaveRNAs. The simple protein enzyme that cleaves RNA in a sequence-specificmanner has not been found in nature, and the known RNases either cleavetheir targets through recognition of certain structures (e.g., RNase IIIfamily, RNase H or most ribozymes) (34-36) or have essentially nocleavage specificity (e.g., RNase A or RNase T1) (37). Development ofASREs will have broad applications for in vitro RNA manipulations, andwill make gene silencing possible in organisms lacking RNAi and/or incellular compartments where RNAi machinery does not function (such as inmitochondria).

ASREs combine the strength of current DM1 therapies and overcome some oftheir limitations. The approaches under investigation are gene therapyand antisense oligonucleotide (AON) based methods. While gene deliverywith AAV is quite efficient in muscle and heart, gene therapy does notallow for the targeting of toxic RNA but rather restores some but notall genes affected in DM patients. On the other hand, AON methods cantarget toxic RNA directly but have delivery problems. Thus, the presentstudy provides a new approach using ASREs that target RNA directly andcan be delivered into muscle and heart, for example, with AAV vectors.The designer ASREs can directly recognize and cleave (CUG)_(n) repeatslike AON does, and can be delivered to the muscle and heart of DM1patient with vectors such as AAV vectors.

ASREs will be directed into the nucleus with NLS to specifically cleave(CUG)_(n) repeats. The (CUG)_(n) repeats accumulated in the nucleus areknown to be the pathogenic molecules of DM1. As there may be apossibility that a small amount of cytoplasmic DMPK with (CUG)_(n) isneeded for normal function of DMPK, targeting only the (CUG)_(n) innuclear foci may be more specific and beneficial.

AAV vectors are proven to be efficient in delivering genes to muscle andheart, and are currently tested in human trails. Therefore this approachcan be readily transferred to an animal DM1 model and human patients.

Engineering ASREs that specifically bind and cleave RNAs. The ability tospecifically cleave RNA has important applications in control of geneexpression, mRNA surveillance and turnover. However, an “RNA restrictionenzyme” that cleaves RNA in a sequence-specific manner has not beenfound in nature. To engineer such an enzyme a modular design was used bycombining a target recognition domain and a catalytic domain. The uniqueRNA recognition domain of human PUM1 was used as it is possible to“reprogram” the binding specificity of the PUF domain to recognize most8-nt RNA. A short peptide linker was used to link a designer PUF with asmall endonuclease domain, the PIN domain of human SMG6 that has awell-defined molecular architecture and requires only a divalent metalcation for RNA cleavage (39,40). The resulting ASREs were expressed andpurified from an E. coli expression system and incubated with RNAsubstrate containing the recognition sequences. ASREs were found toefficiently recognize and cleave RNA substrates with cognate bindingsites (FIG. 23), and can distinguish between substrates differing by2-nt at recognition site (FIG. 23, two ASREs with different recognitionsites were used). By mapping the ASRE cleave site with 3′- and 5′-RACEof gel purified products, it was found that ASREs make a single cut ofRNA near the PUF binding site to generate products with 5′-phosphate and3′-hydroxyl groups (FIG. 3, Panel C). The cleavage happens mostlydownstream of the PUF binding site, suggesting a curve-backconfiguration of ASRE (FIG. 23, Panel B). The specifically designed ASREcan be used to silence gene expression in organisms where RNAi may notbe available or active (FIG. 4).

In addition to cleaving RNA substrate in vitro, the ASRE can be used tospecifically silence gene expression by directly targeting thecorresponding RNA. An ASRE was designed to target the LacZ gene inbacterial cells, and it was shown that ASRE(LacZ) can indeed inducespecific mRNA degradation and gene silencing, whereas control ASRE orASRE with a mutated PIN domain did not affect gene expression (FIG. 4).This method is particularly useful to specifically cleave RNA in cellcompartments where RNAi is not functional (e.g., in the mitochondrionand/or nucleus), and it was found that a specifically designed ASRE canbe used to selectively degrade the mRNA of a mitochondrial-encoded geneND1.

Determination of the modular binding code of PUF domain for Cnucleotide. Because the native PUF repeats can recognize A, U, or Gresidue in a modular fashion, the challenge of making an ASRE to cleavea (CUG)_(n) repeat is to identify the C binding code of the PUF repeatto engineer specific PUF domains for (CUG)_(n) sequence. The specificityof each PUF repeat is determined by the two residues at positions 16 and20 in that repeat (31). To identify the C binding code of PUF repeat, ayeast three-hybrid (Y3H) screen was used to determine the specific aminoacid combination that enables the PUF domain to recognize C (41). Thedesign of the Y3H screen is similar to the Y2H screen except that an RNAmolecule is used as an adaptor of molecular interaction (FIG. 24, PanelA). The specific binding of the PUF domain to the RNA adaptor canrecruit the transcriptional activation domain and turn on the expressionof reporter genes. The wild-type PUF specifically recognizes Wt RNAsequence (UGUAUAUA), and the third position of this target was changedto C (UGCAUAUA; U3C RNA) and amino acids responsible for baserecognition in the PUF repeat 6 were randomized. As a positive control,the wild-type PUF can recognize WtRNA to activate LacZ gene (FIG. 24,Panel B), whereas the single mutation U3C completely abolished thebinding of WT PUF and U3C RNA, giving a very low background for thescreen.

The corresponding sequences coding for the repeat 6 of PUF domain (FIG.24, Panel C) were randomized and cloned into the Y3H system to screenfor yeast colonies that grew on His depleted plates. The positive cloneswere reconfirmed by the expression of the LacZ gene, and the plasmidDNAs were purified from the double positive clones and were furthersequenced to identify the amino acid combination in PUF repeat 6. Atotal of 19 positive clones were sequenced and it was found that allhave the same amino acid combinations (Ser in position 16 and Arg inposition 20). In addition, all the triplet codes for Ser and Arg werefound in the sequencing results, suggesting that the screen has goodcoverage for the random sequence space. Further testing was done todetermine if the binding code of the C residue is selective using theRNAs with A, U and G at the third position and the PUF domains withSxxxR (SEQ ID NO:143) sequence in repeat 6 (R6SR). The modified PUFdomain binds most strongly to RNA with a C in the third position (C3) asjudged by the LacZ activity of yeast strain (FIG. 24, Panel D), whereasthe binding between other bases was either not detectable (for A or G)or very weak (for U). As controls, the wild type PUF only binds to U3RNA but not the other sequences, and a modified PUF with two amino acidinserted between Ser and Arg of repeat 6 (PUF-Eco) does not bind to anyof the target RNAs (FIG. 24, Panel D).

Other PUF repeats were further changed into the SxxxR (SEQ ID NO:143)amino acid combination, and it was confirmed that such modification cangenerate new PUF domains that specifically recognize RNA with a C at thecognate position. The identification of the modular binding code for Cresidue enabled the generation of a PUF domain that recognizes any RNAsequence, making it possible to engineer artificial proteins tospecifically manipulate any given RNA target.

Engineering of PUF domains to recognize (CUG)_(n) repeats with highaffinity and specificity. The modular structure of the PUF domainenabled the programming of the specificity of each PUF repeatindependently. After obtaining the “C binding code” with the Y3H screenof PUF repeat 6, other PUF repeats were changed into this modular codeand it was confirmed that such modifications produced PUF domains thatrecognize RNAs with C at corresponding positions. This result suggestedthat the binding code is indeed “modular,” which will enable the designof a PUF domain that specifically recognizes any 8-nt RNA in (CUG)_(n)repeat. The (CUG)_(n) repeat can generate three different RNA octamers(CUGCUGCU, UGCUGCUG and GCUGCUGC) according to different frames (FIG.25, Panel A). Through step-wise mutagenesis on each PUF repeat, threePUF domains can be engineered, each recognizing one of the possible 8-ntin (CUG)_(n). The binding affinity of PUF:(CUG)_(n) will be assayedeither with the Y3H assay (by measuring the β-gal activity (42)), orwith a standard electrophoretic mobility shift assay (EMSA) usingpurified PUFs.

FIG. 25, Panel B shows two examples of PUF domains (PUF-D and PUF-E)that were step-wise mutated in each PUF repeat to recognize frame 1 and2 of (CUG)_(n). The starting PUF domains were generated for otherreasons to recognize different sequence, and 1-3 PUF repeats weremutated in each step to change the binding specificity of that repeat(indicated with arrows in FIG. 25, Panel B). This process will generateseveral intermediate PUF domains (PUF-A to C) that recognize different8-nt targets. The intermediate PUFs and final PUFs were co-transfectedwith (CUG)₅ RNA into the Y3H system, and assays were carried out todetect binding between different PUFs to the (CUG)₅ RNA which containsonly a single copy of the 8-nt target. It was found that only the PUF-Dand PUF-E, not the intermediate PUFs (PUF-A to C) or the wt-PUF, canbind (CUG)₅ to mediate LacZ expression (FIG. 25, Panel C). The bindingbetween designer PUFs to the (CUG)₅ repeat is very specific, with muchhigher affinity than the binding between WT-PUF to its target as judgedby LacZ expression (FIG. 25, Panel C). This experiment demonstrated thatthe newly identified “C code” could indeed be used to engineer PUFs.

PUF domains will be generated that bind to the third frame of (CUG)_(n).The resulting PUFs for all three frames will be expressed and purifiedto measure the binding affinity to synthesized (CUG)_(n) (n=10, 20 and50, RNA will be chemically synthesized or transcribed in vitro). Adetermination will also be made of the PUF:(CUG)_(n) bindingstoichiometry, which may depend on RNA length. Studies will be conductedto examine if the designer PUF domains can bind to control sequencessuch as (GUC)_(n) and (CAG)_(n) that form similar structures but havedifferent 8-nt sub-sequence. PUF domains will be generated that can beused either as the RNA binding modules of ASREs, or as competitors ofthe endogenous MBNL1 sequestered by the toxic RNA repeats.

Engineering and optimizing ASREs that recognize (CUG)_(n) repeats. ASREswill be constructed with an N-terminal PUF domain and C-terminal PINdomain linked by a short linker (FIG. 23, Panel A). A nuclearlocalization signal (NLS) that directs the ASRE to the nucleus and aHis-tag for detection/purification will be included. The NLS from SV40large T-antigen (PKKKRKV, SEQ ID NO:72) will be used, which does notinterfere with functions of PUF fusion proteins (33). At least threedesigner ASREs will be engineered with PUF domains recognizing all 8-ntsequences in (CUG)_(n) repeats (FIG. 25, Panel A).

The ASREs will be expressed and purified, and their activities will betested in vitro using (CUG)_(n) substrates (n=10, 20 and 50). The RNAproducts will be separated with denatured PAGE gel to access thecleavage results. The catalytic efficiencies of new ASREs will becompared to the previous ones that have k_(cat)/K_(m) in the range of˜10⁷M⁻¹ min⁻¹. Since previous ASREs made a single cleavage in RNAsubstrate with one PUF binding site, studies will be conducted toexamine how the new ASREs will cut (CUG)_(n) containing multiple targetsites.

The amino acid combination that can make the PUF repeat recognize C hasbeen identified. Since the native PUF domains were never found to beable to recognize C, the Y3H screen makes it possible to engineer an RNArecognition domain for any 8-nt target. However, the binding selectivityagainst other bases may not be optimal, and it was found that the newlymodified PUF can recognize a U residue in the third position with a lowaffinity (FIG. 24, Panel D). The binding specificity will be optimizedby molecular modeling and mutagenesis. The PUF-RNA interface near the Cnucleotide will be modeled using the PUF structure and Resetta softwareplatform (43). The goal is to identify additional PUF mutations thatdecrease affinity to A, U and G while maintaining the affinity to Cnucleotide.

The slight compromise of the binding affinity between designer PUFs andtheir target should not be very serious for the goal of recognizing(CUG)_(n) repeats, as the long repeats contain many copies of the samesequence. The results of molecular modeling will be applied to mutatethe base at position 17 of each repeat, which is responsible for thestacking of RNA base to the PUF repeat. Different amino acids will betested in this position to increase the selectivity at the expense ofdecreasing binding affinity. The resulting PUFs will be optimized torecognize (CUG)_(n), and have a balance between binding affinity andsequence selectivity.

Cleavage of toxic (CUG)n repeats in DM1 cells with artificial RNAendonuclease. Mammalian expression vectors of designer ASREs will begenerated to test if they can cleave (CUG)_(n) inside cultured cells.Cultured muscle cells from transgenic DM1 mouse (HSA^(LR) mice;University of Rochester) and human DM1 fibroblasts (Coriell CellRepositories) will be used. The cultured cells will be transfected witheither transient or stable expression vectors containing designer ASREgenes, and assayed to determine (i) the expression and nuclearlocalization of ASREs, (ii) if the ASREs can disperse the nuclear fociof (CUG)_(n), (iii) if the (CUG)_(n) RNA is degraded, and (iv) if theMBNL1 are released from the nuclear foci by ASREs. Western blot will beused to examine if the level of CUGBP1 will be decreased in the ASREtransfected cells.

Further testing will be done to determine the level and localization ofrelated splicing factors such as MBNL2, MBXL and CUGBP2, which may playpartially overlapping roles with MBNL1 and CUGBP1 (10). In addition,RT-PCR will be used to examine the alternative splicing events in genesthat are affected by (CUG)_(n) in DM1 patients. Focus will be onsplicing of CIC-1, insulin receptor, cTnT and SERCA1, all of which areknown to undergo splicing shift in DM1 cells to produce embryonicisoforms. The reversal of the splicing by designer ASREs in cultured DM1cells is the expected outcome.

Determination of how the designer ASREs affect alternative splicing ingenomic scale. Studies will be conducted to further examine how thedesigner ASREs will affect gene expression and alternative splicing ingenomic scale. The high throughput (HTP) sequencing of total mRNA willbe carried out using mRNAs from DM1 cells treated with designer ASREs,control ASRE and empty vector. Changes in both total gene expression andratios of splicing isoforms will be determined. In other projects areliable protocol was used to generate a directional cDNA library forHTP sequencing. It was possible to obtain >20 million reads of 75nt in asingle illumina sequencing lane, and the Bowtie-TopHat pipeline was usedto analyze the alternative splicing events. The changes of eachalternative splicing event between different samples can be scored usingthe fisher's exact test (44). The mRNA-seq data will be compared withthe previous splicing microarray result in DM1 mice (13), and adetermination will be made regarding whether the same set of alternativesplicing events can be reversed by designer ASREs.

Because the mRNA-seq experiments can generate a large amount of data forthe expression of all genes, such data can be used to assess theoff-target effect of ASRE treatment. The (CUG)_(n) repeat will be thebest substrate of the designer ASRE as it contains many copies of ASREtargets. Other nuclear RNA may be cleaved to a lesser extent, and mRNAin the cytoplasm will largely evade cleavage. Therefore a largeoff-target effect is not expected. However, in case that other mRNAscontaining the same 8-nt sequence are cleaved and such genes arecritical to normal function of cells, a computer will be used to scanthe mRNA for the ASRE target, and a combination of multiple ASREs thattarget different 8-nt in (CUG)_(n) will be used to minimize theoff-target effect. Using multiple ASREs at a lower 3Q level willdecrease the off-target effects as only the (CUG)_(n) can be targeted byall designer ASREs.

Determination of the in vivo efficacy of designer ASREs using DM1 mousemodel. In these studies, the in vivo efficacy of ASRE will be determinedin a animal model of DM1. In particular, viral vectors will be used todeliver ASREs into DM1 mouse model (HSA^(LR) mice) and tests will bedone to determine if the ASREs can correct the alternative splicing ofkey genes and relieve the DM1 phenotype in muscle and heart.

Generation of AAV virus particles ASRE with AAV vectors. Because of therelatively small size of the ASRE gene (˜1.5 kb), it can be packagedwith an AAV vector that has been shown to efficiently deliver genes intoskeletal muscles and heart (21,45,46).

ASREs that can effectively cleave (CUG)_(n) repeat in vitro will becloned into different AAV vectors. As controls, ASREs that recognizedifferent RNA sequences and the ASRE with mutated active site in the PINdomain will be used. Two types of promoters will be tested to drive theexpression of ASRE in AAV: the first is the CMV promoter that iscommonly used in AAV systems for high level expression in all tissues,the second type are muscle-specific promoters such as myogenin promoterand/or synthetic muscle promoters (47). The expression efficiency of thetwo types of promoters will be analyzed in cultured skeletal musclecells, and the muscle-specific promoters will be selected if theexpression level is comparable to CMV. The goal is to make ASREexpression more specific in muscles, therefore limiting the possibleoff-target effects. Upon confirming expression in muscle cells, AAVparticles will be generated that contain ASREs.

Local delivery of ASREs with AAV vectors. To test the efficiency ofASREs in the whole animal, experiments will be carried out using intramuscular injection of AAV particles into mouse vastus muscles. A DM1mouse model (HSA^(LR) mouse) that contains ˜250 non-coding (CUG)_(n)repeats and that can reproduce the DM phenotypes (7) will be used. AAVserotype 9 (AAV9), which can mediate high level gene expression inmuscle and heart by either systematic injection or local injection (48,49) will be used. It is expected that the muscle tissues near theinjection region will stably express ASREs 1-2 weeks after injection,leading to the cleavage of (CUG)n repeats.

To determine the in vivo effects of ASREs, the muscle fibers will bedissected and assayed for the accumulation of (CUG)_(n) repeat withfluorescence in situ hybridization (FISH). Assays will also be done todetermine the distribution pattern of MBNL1 that is accumulated asnuclear foci in HSA^(LR) mouse but has a dispersed nuclear localizationpattern in normal mouse. The level of other proteins (such as CUGBP1)whose expressions have changed in DM1 patients by (CUG)_(n) repeats willalso be measured, and it will be determined if the ASRE can restorenormal expression levels of these proteins. In addition, RNA will bepurified from the muscle tissues in the injected region and RT-PCR willbe used to assay for the alternative splicing pattern of key genes thatwere affected in HSA^(LR) mouse (including CIC-1, cTnT and SERCA1). Itis expected that the splicing pattern of these genes will be shiftedtowards the normal adult isoforms by ASREs compared to the controls. Todetermine whether AAV treatment can rescue the physiological deficits ofDM1, the expression and function of muscle-specific chloride channel(CIC-1) will be examined. Finally, it will be determined whether theASRE treatments can reduce the myotonia phenotype of muscles usingelectromyography (EMG) (7).

Systemic delivery of ASREs with AAV9. In the second stage ofexperiments, intravenous injection will be used to deliver ASREexpressing virus (AAV9) into HSA^(LR) mice. Since AAV9 can facilitaterobust gene expression in muscles and heart after systemic delivery (48,49), it is expected that ASREs will be highly expressed in muscles andheart several days after IV injection. Similar experiments will becarried out to test the in vivo effects of ASREs on muscle tissues asdescribed herein. Briefly, (i) assays for the accumulation of (CUG)_(n)repeats will be carried out with FISH to determine if ASRE can dispersethe nuclear foci; (2) assays for the distribution pattern of MBNL1 andthe expression levels of CELF proteins will be carried out; (3) assaysfor alternative splicing patterns for key genes that were affected inHSA^(LR) mouse (including CIC-1, cTnT and SERCA1) will be carried out;and (4) assays to determine whether AAV treatment can rescue thephysiological deficits of DM1, including the electrophysiologicalproperties of muscle-specific chloride channel (CIC-1) and the myotoniaphenotype of muscles will be carried out.

Further studies will be conducted to examine if the ASRE treatment canreduce the histologically defined myopathy. In addition, the HSA^(LR)mice had a mortality of 41% by 44 weeks after weaning (compared to <5%for non-transgenic mice) (7), thus studies will be conducted todetermine if the ASRE treatment can reduce the mortality rate of DM1mice.

References for Example 3

-   1. Lee J E, Cooper T A. Pathogenic mechanisms of myotonic dystrophy.    Biochem Soc Trans. 2009; 37(Pt 6):1281-6.-   2. Turner C, Hilton-Jones D. The myotonic dystrophies: diagnosis and    management. J Neurol Neurosurg Psychiatry. 2010; 81(4):358-67.-   3. Wheeler T M, Thornton C A. Myotonic dystrophy: RNA-mediated    muscle disease. Curr Opin Neurol. 2007; 20(5):572-6.-   4. Mahadevan M, Tsilfidis C, Sabourin L, Shutler G, Amemiya C,    Jansen G, et al. Myotonic dystrophy mutation: an unstable CTG repeat    in the 3′ untranslated region of the gene. Science. 1992;    255(5049):1253-5.-   5. Brook J D, McCurrach M E, Harley H G, Buckler A J, Church D,    Aburatani H, et al. Molecular basis of myotonic dystrophy: expansion    of a trinucleotide (CTG) repeat at the 3′ end of a transcript    encoding a protein kinase family member. Cell. 1992; 68(4):799-808.-   6. Liguori C L, Ricker K, Moseley M L, Jacobsen J F, Kress W, Naylor    S L, et al. Myotonic dystrophy type 2 caused by a CCTG expansion in    intron 1 of ZNF9. Science. 2001; 293(5530:864-7.-   7. Mankodi A, Logigian E, Callahan L, McClain C, White R, Henderson    D, et al. Myotonic dystrophy in transgenic mice expressing an    expanded CUG repeat. Science. 2000; 289(5485):1769-73.-   8. Miller J W, Urbinati C R, Teng-Umnuay P, Stenberg M G, Byrne B J,    Thornton C A, et al. Recruitment of human muscleblind proteins to    (CUG)(n) expansions associated with myotonic dystrophy. EMBO J.    2000; 19(17):4439-48. PMCID: PMC302046.-   9. Jiang H, Mankodi A, Swanson M S, Moxley R T, Thornton C A.    Myotonic dystrophy type 1 is associated with nuclear foci of mutant    RNA, sequestration of muscleblind proteins and deregulated    alternative splicing in neurons. Hum Mol Genet. 2004;    13(24):3079-88.-   10. Fardaei M, Rogers M T, Thorpe H M, Larkin K, Hamshere M G,    Harper P S, et al. Three proteins, MBNL, MBLL and MBXL, co-localize    in vivo with nuclear foci of expanded-repeat transcripts in DM1 and    DM2 cells. Hum Mol Genet. 2002; 11(7):805-14.-   11. Philips A V, Timchenko L T, Cooper T A. Disruption of splicing    regulated by a CUG-binding protein in myotonic dystrophy. Science.    1998; 280(5364):737-41.-   12. Kuyumcu-Martinez N M, Wang G S, Cooper T A. Increased    steady-state levels of CUGBP1 in myotonic dystrophy 1 are due to    PKC-mediated hyperphosphorylation. Mol Cell. 2007; 28(1):68-78.    PMCID: PMC2083558.-   13. Du H, Cline M S, Osborne R J, Tuttle D L, Clark T A, Donohue J    P, et al. Aberrant alternative splicing and extracellular matrix    gene expression in mouse models of myotonic dystrophy. Nat Struct    Mol Biol. 2010; 17(2):187-93. PMCID: PMC2852634.-   14. Charlet B N, Savkur R S, Singh G, Philips A V, Grice E A, Cooper    T A. Loss of the muscle-specific chloride channel in type 1 myotonic    dystrophy due to misregulated alternative splicing. Mol Cell. 2002;    10(1):45-53.-   15. Mankodi A, Takahashi M P, Jiang H, Beck C L, Bowers W J, Moxley    R T, et al. Expanded CUG repeats trigger aberrant splicing of C1C-1    chloride channel pre-mRNA and hyperexcitability of skeletal muscle    in myotonic dystrophy. Mol Cell. 2002; 10(1):35-44.-   16. Savkur R S, Philips A V, Cooper T A. Aberrant regulation of    insulin receptor alternative splicing is associated with insulin    resistance in myotonic dystrophy. Nat Genet. 2001; 29(1):40-7.-   17. Kimura T, Nakamori M, Lueck J D, Pouliquin P, Aoike F, Fujimura    H, et al. Altered mRNA splicing of the skeletal muscle ryanodine    receptor and sarcoplasmic/endoplasmic reticulum Ca2+-ATPase in    myotonic dystrophy type 1. Hum Mol Genet. 2005; 14(15):2189-200.-   18. Hino S, Kondo S, Sekiya H, Saito A, Kanemoto S, Murakami T, et    al. Molecular mechanisms responsible for aberrant splicing of SERCA1    in myotonic dystrophy type 1. Hum Mol Genet. 2007; 16(23):2834-43.-   19. Kalsotra A, Xiao X, Ward A J, Castle J C, Johnson J M, Burge C    B, et al. A postnatal switch of CELF and MBNL proteins reprograms    alternative splicing in the developing heart. Proc Natl Acad Sci    USA. 2008; 105(51):20333-8. PMCID: PMC2629332.-   20. Kanadia R N, Shin J, Yuan Y, Beattie S G, Wheeler T M, Thornton    C A, et al. Reversal of RNA missplicing and myotonia after    muscleblind overexpression in a mouse poly(CUG) model for myotonic    dystrophy. Proc Natl Acad Sci USA. 2006; 103(31):11748-53. PMCID:    PMC1544241.-   21. Wang Z, Zhu T, Qiao C, Zhou L, Wang B, Zhang J, et al.    Adeno-associated virus serotype 8 efficiently delivers genes to    muscle and heart. Nat Biotechnol. 2005; 23(3):321-8.-   22. Langlois M A, Boniface C, Wang G, Alluin J, Salvaterra P M,    Puymirat J, et al. Cytoplasmic and nuclear retained DMPK mRNAs are    targets for RNA interference in myotonic dystrophy cells. J Biol    Chem. 2005; 280(17):16949-54.-   23. Mulders S A, van den Broek W J, Wheeler T M, Croes H J, van    Kuik-Romeijn P, de Kimpe S J, et al. Triplet-repeat    oligonucleotide-mediated reversal of RNA toxicity in myotonic    dystrophy. Proc Natl Acad Sci USA. 2009; 106(33):13915-20. PMCID:    PMC2728995.-   24. Langlois M A, Lee N S, Rossi J J, Puymirat J. Hammerhead    ribozyme-mediated destruction of nuclear foci in myotonic dystrophy    myoblasts. Mol Ther. 2003; 7(5 Pt 1):670-80.-   25. Wheeler T M, Sobczak K, Lueck J D, Osborne R J, Lin X, Dirksen R    T, et al. Reversal of RNA dominance by displacement of protein    sequestered on triplet repeat RNA. Science. 2009; 325(5938):336-9.-   26. Gareiss P C, Sobczak K, McNaughton B R, Palde P B, Thornton C A,    Miller B L. Dynamic combinatorial selection of molecules capable of    inhibiting the (CUG) repeat RNA-MBNL1 interaction in vitro:    discovery of lead compounds targeting myotonic dystrophy (DM1). J Am    Chem Soc. 2008; 130(48):16254-61. PMCID: PMC2645920.-   27. Arambula J F, Ramisetty S R, Baranger A M, Zimmerman S C. A    simple ligand that selectively targets CUG trinucleotide repeats and    inhibits MBNL protein binding. Proc Natl Acad Sci USA. 2009;    106(38):16068-73. PMCID: PMC2752522.-   28. Warf M B, Nakamori M, Matthys C M, Thornton C A, Berglund J A.    Pentamidine reverses the splicing defects associated with myotonic    dystrophy. Proc Natl Acad Sci USA. 2009; 106(44):18551-6. PMCID:    PMC2774031.-   29. Wickens M, Bernstein D S, Kimble J, Parker R. A PUF family    portrait: 3′UTR regulation as a way of life. Trends Genet. 2002;    18(3):150-7.-   30. Wang X, McLachlan J, Zamore P D, Hall T M. Modular recognition    of RNA by a human pumilio-homology domain. Cell. 2002;    110(4):501-12.-   31. Cheong C G, Hall T M. Engineering RNA sequence specificity of    Pumilio repeats. Proc Natl Acad Sci USA. 2006; 103(37):13635-9.    PMCID: PMC1564246.-   32. Ozawa T, Natori Y, Sato M, Umezawa Y. Imaging dynamics of    endogenous mitochondrial RNA in single living cells. Nat Methods.    2007; 4(5):413-9.-   33. Wang Y, Cheong C G, Hall T M, Wang Z. Engineering splicing    factors with designed specificities. Nat Methods. 2009;    6(11):825-30.-   34. MacRae I J, Doudna J A. Ribonuclease revisited: structural    insights into ribonuclease III family enzymes. Curr Opin Struct    Biol. 2007; 17(1):138-45.-   35. Champoux J J, Schultz S J. Ribonuclease H: properties, substrate    specificity and roles in retroviral reverse transcription. FEBS J.    2009; 276(6):1506-16.-   36. Scott W G. Ribozymes. Curr Opin Struct Biol. 2007; 17(3):280-6.-   37. Yoshida H. The ribonuclease T1 family. Methods Enzymol. 2001;    341:28-41.-   38. Hammond S M, Wood M J. PRO-051, an antisense oligonucleotide for    the potential treatment of Duchenne muscular dystrophy. Curr Opin    Mol Ther. 2010; 12(4):478-86.-   39. Takeshita D, Zenno S, Lee W C, Saigo K, Tanokura M. Crystal    structure of the PIN domain of human telomerase-associated protein    EST1A. Proteins. 2007; 68(4):980-9.-   40. Glavan F, Behm-Ansmant I, Izaurralde E, Conti E. Structures of    the PIN domains of SMG6 and SMG5 reveal a nuclease within the mRNA    surveillance complex. EMBO J. 2006; 25(21):5117-25.-   41. Hook B, Bernstein D, Zhang B, Wickens M. RNA-protein    interactions in the yeast three-hybrid system: affinity,    sensitivity, and enhanced library screening. Rna. 2005;    11(2):227-33.-   42. Stumpf C R, Opperman L, Wickens M. Chapter 14. Analysis of    RNA-protein interactions using a yeast three-hybrid system. Methods    Enzymol. 2008; 449:295-315.-   43. Das R, Baker D. Macromolecular modeling with rosetta. Annu Rev    Biochem. 2008; 77:363-82.-   44. Wang E T, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et    al. Alternative isoform regulation in human tissue transcriptomes.    Nature. 2008; 456(7221):470-6. PMCID: PMC2593745.-   45. Zhu T, Zhou L, Mori S, Wang Z, McTiernan C F, Qiao C, et al.    Sustained whole-body functional rescue in congestive heart failure    and muscular dystrophy hamsters by systemic gene transfer.    Circulation. 2005; 112(17):2650-9.-   46. Wang B, Li J, Fu F H, Xiao X. Systemic human minidystrophin gene    transfer improves functions and life span of dystrophin and    dystrophin/utrophin-deficient mice. J Orthop Res. 2009; 27(4):421-6.-   47. Li X, Eastman E M, Schwartz R J, Draghia-Akli R. Synthetic    muscle promoters: activities exceeding naturally occurring    regulatory sequences. Nat Biotechnol. 1999; 17(3):241-5.-   48. Kornegay S N, Li J, Bogan J R, Bogan D J, Chen C, Zheng H, et    al. Widespread muscle expression of an AAV9 human mini-dystrophin    vector after intravenous injection in neonatal dystrophin-deficient    dogs. Mol Ther. 2010; 18(8):1501-8. PMCID: PMC2927072.-   49. Inagaki K, Fuess S, Storm T A, Gibson G A, McTiernan C F, Kay M    A, et al. Robust systemic transduction with AAV9 vectors in mice:    efficient global cardiac gene transfer superior to that of AAV8. Mol    Ther. 2006; 14(1):45-53. PMCID: PMC1564441.

Example 4

PUF proteins possess a recognition code for bases A, U and G, allowingdesigned RNA sequence specificity of their modular Pumilio (PUM)repeats. However, recognition side chains in a PUM repeat for cytosineare unknown. Described herein is the identification of acytosine-recognition code by screening random amino acid combinations atconserved RNA recognition positions using a yeast 3-hybrid system. ThisC-recognition code is specific and modular, as specificity can betransferred to different positions in the RNA recognition sequence. Acrystal structure of a modified PUF domain reveals specific contactsbetween an arginine side chain and the cytosine base. The C-recognitioncode was applied to design PUF domains that recognize targets withmultiple cytosines and to generate engineered splicing factors thatmodulate alternative splicing. A divergent yeast PUF protein, Nop9p, wasalso identified, that may recognize natural target RNAs with cytosine.This work furthers understanding of natural PUF protein targetrecognition and expands the ability to engineer PUF domains to recognizeany RNA sequence.

The specific interaction of RNA and protein plays vital roles in RNAregulation including splicing, localization, translation anddegradation. Such recognition may be directed toward unstructured RNArequiring discrimination of RNA sequences, folded RNA motifs, or somecombination of sequence and structural specificity (1). Members of thePUF protein family (named after Drosophila Pumilio and Caenorhabditiselegans fem-3 mRNA binding factor [FBF]) are sequence-specificRNA-binding proteins that regulate networks of mRNAs encoding proteinsof related function (2-7). PUF proteins generally recognize the 3′-UTRof their target mRNAs to control the mRNA stability and translation(2-7).

The RNA-binding domain of PUF proteins, known as the Pumilio homologydomain (PUM-HD) or PUF domain, can bind to unstructured RNA sequences ina distinct fashion. The PUF domain of human Pumilio1 contains eight PUMrepeats, each containing three α-helices packed together in a curvedstructure (8-10). RNA is bound as an extended strand to the concavesurface of the PUF domain with the bases contacted by protein sidechains. In general, each PUM repeat recognizes a single RNA base throughthe second helix (α2) in an antiparallel arrangement, i.e., nucleotides1-8 are recognized by PUF repeats 8-1, respectively. The α2 helices ofPUM repeats contain a five-residue sequence, designated here as 12xx5,where the side chain at position 2 stacks with the recognized base andthe side chains at positions 1 and 5 recognize the edge of the base(8,11) (FIG. 27, Panel A). Specific residues at these positions directthe base recognition properties of the repeat. This PUF-RNA recognitioncode makes it possible to modify a PUM repeat to bind a particular RNAbase, producing a designed PUF domain that specifically recognizes agiven 8-nt RNA target. Such de novo designed RNA binders have been usedto track RNA localization in cells (12,13), study PUF protein function(14,15), and modulate alternative splicing (16) and continue to providea useful tool for biomedical research with possible therapeuticapplications.

One limitation to application of designed PUF proteins is that, althoughthe modular code for recognition of RNA bases A, U and G has beendeduced, a code for cytosine recognition by a PUM repeat is unknown.Thus recognition of a cytosine cannot be engineered in a repeat,although Pumilio 1 can accept any base including cytosine at the Stnposition of the target sequence and yeast Puf3p specifically recognizesa cytosine two bases upstream of the core PUF recognition sequence (17).Naturally occurring PUM repeats that specifically recognize a cytosinehave not been identified, providing no clues to a cytosine-recognitioncode and uncertainty about whether such specific recognition exists oris possible. The identification of a combination of amino acid sidechains in a PUM repeat that can recognize a cytosine is necessary toexpand the use of designed PUF domains directed toward any RNA sequence.

Using a yeast 3-hybrid system, it was found that the five-residueRNA-interaction sequence SYxxR (SEQ ID NO:4) allows PUM repeats of humanPumilio 1 (hereafter referred to as PUF for simplicity) to specificallyinteract with cytosine. In a crystal structure of a complex between amutant PUF(SYxxR) and cognate RNA, the arginine side chain interactsdirectly with the cytosine and the serine side chain helps to positionthe arginine residue. This recognition code is applied as describedherein to design new PUF domains to recognize RNA targets with multiplecytosine residues such as CUG repeats that are responsible for thepathogenesis of myotonic dystrophy. The code is also used to engineersplicing factors that modulate alternative splicing of both a splicingreporter and an endogenous gene. Furthermore, a naturally occurringyeast PUF protein, Nop9p, appears to contain a repeat with a code forcytosine and is conserved in homologs from yeast to human, suggestingthat the natural target sequences of these PUF proteins may containcytosine (FIG. 32).

Generation of a random sequence library. A PUF mutant library wasgenerated through three PCR amplifications using primers with randomizedregions. In reaction 1, the 5′ portion of the Pumilio 1 PUF domain wasamplified from wild-type PUF with primers Bam-Puf-1F (5′-GGA TCC GAG GCCGCA GCC GCC TTT TGG AA, SEQ ID NO:74) and Puf-R6N-2R (5′-GAT TAC ATA NNNTCC ATA TTG ATC CTG TAC CAG, SEQ ID NO:75). In reaction 2, the 3′portion of the PUF domain was amplified with primers Puf-R6N-1F (5′-TATGTA ATC NNN CAT GTA CTG GAG CAC GGT CG, SEQ ID NO:76) and Puf-Xho-2R(5′-CTC GAG CCC CTA AGT CAA CAC CGT TCT TC, SEQ ID NO:77). ThePuf-R6N-2R contains 3 random nucleotides encoding the amino acid atposition 1043 while Puf-R6N-1F contains random nucleotides encoding theresidue at position 1047. The purified PCR products of reactions 1 and 2were mixed as the template for reaction 3 with primers Bam-Puf1-1F andPuf-Xho-2R. The final PCR products encode the entire PUF domain and havethe two randomized codons at positions 1043 and 1047.

Yeast expression plasmid encoding wild-type PUF fused at the N terminusto the Gal-4 AD was created by amplification of the coding sequence ofthe PUF domain from pTYB3-HsPUM1-HD (9) and subcloned into the pACT2plasmid using BamHI and Xhol sites. Plasmids expressing target RNAs weremade by annealing DNA oligonucleotides encoding the desired RNAs andsubcloning into the pIIIA-MS2-2 plasmid using Smal and Sphl restrictionsites.

Y3H assays were performed in yeast strain YBZ-1 as described previously(18,19). For the Y3H screen, instead of generating an E. coli plasmidlibrary, a yeast library screening system was generated directly throughgap-repair. First, the pIIIA-MS2-2 plasmid carrying UGCAUAUA RNA wastransformed and expressed in yeast strain YBZ-1. Second, an EcoRI sitewas introduced by site-directed mutagenesis into wild-type pACT2-PUFbetween the nucleotides encoding positions 1043 and 1047. ThepACT2-PUF-Eco DNA was linearized by EcoRI and co-transformed with therandom PUF PCR library at a molar ratio of 6:1 into the yeast. About50,000 yeast clones were generated, giving at least 10-fold coverage ofthe entire 6-nt sequence space (4⁶=4096). Yeast transformants werescreened on plates lacking histidine and containing 10 mM 3-AT. Thetransformants that survived HIS growth selection were confirmed withLacZ expression. Selected yeast plasmid DNAs were sequenced andreintroduced into the mother strain to confirm the interaction andspecificity.

Plasmid constructs. Additional PUF site-directed mutants carried bypACT2 were generated using the QuikChange site-directed mutagenesis kit(Agilent). The pTYB3-PUF mutants for in vitro protein expression werecreated by PCR amplification from yeast expression plasmids andsubcloning into the pTYB3 plasmid using NcoI and SapI restriction sites.To generate the ESFs that recognize C-containing target sequences,plasmids encoding the RS-PUF or Gly-PUF fusion proteins (16) weremutated.

Liquid β-galactosidase assays. The activity of β-galactosidase wasmeasured using 96-well plates using 12 clones from each sample (20). Theyeast colonies were randomly picked and inoculated into 12 differentwells with 100 μl culture media in a 96-well plate. After overnightgrowth in 24° C. with shaking, the culture density of each well wasdetermined by reading OD₆₅₀ with a plate-type spectrophotometer(spectroMAX PLUS from Molecular Devices). In each clone, 25 μl of cellculture was removed and transferred into a new 96-well plate and mixedwith 225 μl of assay buffer (60 mM Na₂HPO₄, 40 mM NaH₂PO₄, 1 mM MgCl₂,0.2% (wt/vol) Sarkosyl and 0.4 mg/ml ONPG). The plate was incubated at37° C. for 2 hours and 100 μl of 1M carbonate solution was added intoeach well to stop the reaction. The OD₄₀₅ was measured with aspectrophotometer to quantify the product (nitrophenol). Theβ-galactosidase units were calculated as the difference of OD₄₀₅ betweenthe sample and the background calibrated by culture densities (20).

Protein expression, purification and electrophoretic mobility shiftassay (EMSA). All proteins were expressed in E. coli strain BL21 andpurified as described previously (9,11). Protein purity was examinedwith SDS/PAGE gel electrophoresis. Protein concentration was determinedby Bradford assay. RNAs were generated by in vitro transcription andpurified on denaturing gels. 50 pmol of RNAs were labeled at the 3′ endwith biotinylated cytidine bisphosphate using T4 RNA ligase followingmanufacturer's directions (Thermo Scientific Pierce RNA 3′ EndBiotinylation Kit). In each sample, 20 fmol of labeled RNA (1nM) and 4pmol of proteins (0.2μM) were incubated in binding buffer (10 mM HEPES,pH 7.3, 20 mM KCl, mM MgCl₂, 1 mM DTT, and 0.1 g/L tRNA) for 1 hour atroom temperature. The binding reactions were separated byelectrophoresis on 6% nondenaturing PAGE run with 1×TBE at 4° C.,transferred to nylon membranes, and crosslinked to the membrane by UV.Biotin-labeled RNA was detected by chemiluminescence using the ThermoScientific LightShift Chemiluminescent RNA EMSA Kit followingmanufacturer's directions.

Crystallization, Structure Determination and Refinement. Crystals ofPUF-R6(SYxxR) mutant and C3 RNA (5′-AUUGCAUAUA-3′, SEQ ID NO:73) weregrown by sitting drop vapor diffusion. RNA oligonucleotide was obtainedfrom Dharmacon (Lafayette, Colo.). The protein-RNA complex was preparedby mixing a 1:1.1 molar ratio of purified protein (3.5 mg/ml) and RNA ina buffer containing 20 mM Tris-HCl, pH 7.5; 100mM NaCl; and 1mM DTT. Oneμl of complex solution was added to 1 μl of a well solution containing30% PEG 3350, 0.2M ammonium tartrate dibasic, and 0.1M bis-Tris, pH 5.5.Crystals were flash frozen after adding an equal volume ofcryoprotectant solution (32% PEG 3350, 20% ethylene glycol) to the drop.Diffraction data were collected at the SER-CAT beamline ID-22, AdvancedPhoton Source at wavelength 1.0 A and −180° C. All data sets wereindexed, integrated, and scaled with the HKL2000 suite (21). Thestructure was determined by molecular replacement using the structure ofhuman Pumilio1 (PDB ID: 1M8Y) as a search model with PHASER (22). Twocomplexes are present in the asymmetric unit. Iterative model buildingwas performed with COOT (23), and the resulting models were refined withPHENIX (24). All Phi-Psi angles are within allowable regions of theRamachandran plot. The atomic coordinates and structure factors havebeen deposited in the Protein Data Bank (PDB ID: 2yjy).

Cell culture, transfection, RNA purification, and RT-PCR. Humanembryonic kidney 293T cells or breast cancer MDA-MB-231 cells were grownin Dulbecco's modified Eagle's medium supplemented with 10% fetal bovineserum. Cells were seeded onto 24-well plates and transfected withLipofectamine 2000 following manufacturer's directions. The purificationof total RNA and semi-quantitative RT-PCR were carried out as describedpreviously (16).

Bioinformatic analyses. Two PUF proteins in budding yeast, Puf2p andNop9p, were identified by searching the SMART database. TheSaccharomyces Genome Database was also searched using the BLASTP withthe following queries: (1) the two natural yeast PUF repeats containinga possible C-recognition code, and (2) all the yeast PUF repeats inwhich the native RNA recognition motifs were replaced with SxxxR (SEQ IDNO:143). Only the two PUF repeats from Puf2p and Nop9p were identified.The entire PUF domains of Puf2p and Nop9p were used as queries to searchthe non-redundant protein sequences using PSI-BLAST (Position-SpecificIterated BLAST), and the positive hits were manually inspected to filterout repeats. A subset of representative sequences with significantmatches was selected to cover a diverse range of organisms. Thesesequences were aligned with ClustalW and a phylogenetic tree wasgenerated with Phylowidget.

Random library screen for cytosine recognition. To select a PUM repeatthat specifically recognizes cytosine, a yeast 3-hybrid (Y3H) system wasused that utilizes co-expression of the PUF domain fused with the Gal-4activation domain (AD), an RNA target with an MS2 binding site, and anMS2-LexA fusion protein (FIG. 27, Panel B). This system can be used toreliably measure the relative binding affinity between RNA and protein(18,19). For this screening, a uridine-to-cytosine mutation wasintroduced at the 3^(rd) position of a wild-type PUF target sequence(FIG. 27, Panel A). A PUF domain library was generated with randomsequences at the 1^(st) and 5^(th)positions of the RNA-interaction motifin repeat 6, which recognizes the third position of the RNA targetsequence (FIGS. 27, Panels A and C). In control experiments,co-expression of wild-type PUF and its cognate target sequence (U3)resulted in activation of HIS3 and LacZ reporter genes. In contrast,wild-type PUF cannot recognize the target RNA with a cytosine at thethird position (C3), suggesting that the screen has a low false positivebackground (FIG. 27, Panel D). Yeast transformants were screened firstfor HIS3 expression, and 200 of the resulting positive clones werereconfirmed with a LacZ activity assay. Plasmids encoding functionalPUFs were recovered from the doubly-positive yeast clones (178 clones),and a subset were sequenced to identify amino acid combinationsdirecting cytosine recognition.

Of the 19 unambiguous sequences obtained, 18 coded for serine at aminoacid position 1043 and arginine at amino acid position 1047, positions 1and 5 in the 5-residue RNA-interaction motif (Table 3, FIG. 27, PanelA). The only exception, clone 15, contained a stop codon at position1047, and therefore is likely a false positive. The 18 clones encodingS1043/R1047 contained four different serine codons and six argininecodons, suggesting that the screen adequately covered sequence space.Identification of a set of cytosine-specific RNA recognition side chains(G/A/S/T/CxxxR, SEQ ID NO:78) was published (25). The more stringentconditions used in the studies described herein (10 mM vs. 0.5 mM 3-AT)may have produced the dominance of the SYxxR (SEQ ID NO:4) sequence overother sets of side chains with arginine at the 5^(th) position as seenin this other study (25). The relative β-galactosidase activities forthe different sets of side chains suggests that the SYxxR (SEQ ID NO:4)combination binds most tightly (18, 25).

To examine the specificity of the newly identified C-recognition code,the RNA-protein interaction between PUF domains and RNA targetscontaining each of the four bases at the 3^(rd) position (FIG. 27, PanelE) was measured. It was found using a Y3H assay that wild-type PUF boundonly to the natural target sequence with a U at the 3^(rd) position(U3), and a mutant protein, PUF-Eco, with an EcoRI site inserted betweenpositions 1043 and 1047 did not recognize any of the target RNAs (FIG.27, Panel E). The PUF with S1043/R1047 mutations in repeat 6,PUF-R6(SYxxR), specifically bound to the C3-containing target withsimilar affinity as the PUF-WT protein and U3 RNA (18) and did notrecognize targets with an A3 or G3. Residual binding of PUF-R6(SYxxR) tothe wild-type U3 sequence was measured, likely due to the lack of astacking side chain (asparagine) in repeat 7.

To further confirm RNA binding, the recombinant PUF protein was purifiedand electrophoretic mobility shift assay (EMSA) was used to demonstratedirect binding of PUF-R6(SYxxR) to C3 RNA. Given the direct and specificinteraction with this in vitro assay, it was concluded that theexpression of LacZ was indeed caused by the direct RNA-protein binding.

The cytosine-recognition code can be transferred to other PUM repeats.To examine the modularity of the C-binding code identified using PUMrepeat 6, the code was applied to PUM repeats 2 and 5 that normally bindto U7 and A4, respectively. It was then tested if such changes specifycytosine recognition at the cognate positions (C7 for repeat 2 and C4for repeat 5) using the Y3H assay. As expected, mutation of theconserved RNA-interacting positions in repeat 2 (positions 899-903becoming SYxxR, SEQ ID NO:4) changed binding specificity from U7 to C7,while wild-type PUF did not recognize a C7 RNA target (FIG. 28, PanelA). Unlike PUF-R6(SYxxR), PUF-R2(SYxxR) did not recognize wild-type U7RNA sequence.

Similarly, mutations in repeat 5 (C1007S/Q1011R or SRxxR, SEQ ID NO:79)are sufficient to change the binding specificity from A4 to C4, whilewild-type PUF does not recognize a C4 target RNA. Repeat 5 of wild-typePUF has an arginine (R1008) in position to stack with the RNA base, andit was found that the two mutations in the edge-interacting side chainswere sufficient for cytosine recognition. Therefore, arginine can serveas the stacking amino acid residue in the C-binding code. However,introduction of a third mutation in repeat 5 (SYxxR (SEQ ID NO:4) inpositions 1007-1011) maintained C-binding specificity and may betterprevent binding to A4-containing RNA (FIG. 28, Panel B).

Effect of the stacking residue on cytosine recognition. Based onmodifications of repeat 5, it was established that arginine can serve asa stacking side chain for cytosine (FIG. 28, Panel B). Studies wereconducted to test the effects of the identity of the stacking, sidechain on cytosine specificity.

The effect of stacking side chain identity was examined using position1044 in repeat 6 of Pumilio 1, which has wild-type RNA-interaction motifNYxxQ (SEQ ID NO:80) that recognizes U3. The interaction motif wasmutated to SHxxR (SEQ ID NO:81) and binding of the mutant protein,PUF-R6(SYxxR), to wild-type U3 and mutant C3 RNA targets was measured(FIG. 28, Panel C). PUF-R6(SHxxR) binds well to RNA containing C3 andmore weakly to U3. When the stacking side chain is changed to tyrosine,PUF-R6(SYxxR), similar effects are seen. It is concluded that specificbinding of cytosine can be achieved with Y/H/R as stacking residue inthe cognate repeat.

Another naturally, though uncommonly, occurring side chain at thestacking position is asparagine, as seen in repeat 7 of wild-typePumilio 1. This repeat specifically recognizes a G base with an SNxxE(SEQ ID NO:82) RNA interaction motif, but the side chain of N1080 is notlong enough to form a stacking interaction with G2 (8,11). In order tochange the specificity of repeat 7 to cytosine, the base-interactingresidues were mutated initially to S1079/R1083. However, it was foundthat PUF-R7(SNxxR) did not bind to target RNAs containing G2 or C2, asjudged by Y3H measurement (FIG. 28, Panel D). When the stacking residuein repeat 7 was also changed to tyrosine (N1080Y), the resultingPUF-R7(SYxxR) bound strongly to C2 and more weakly to G2 (FIG. 28, PanelD). Thus, for cytosine recognition, a side chain forming a stackinginteraction with the RNA base appears important for binding.

In addition to residues in the cognate repeat, it was found that theidentity of the stacking residue in the following repeat, which alsocontacts the RNA base, can contribute to the binding affinity at somepositions. Most wild-type repeats in Pumilio 1 have tyrosine or arginineas the stacking residue in the following repeat, the only exceptionbeing repeat 3 with a histidine in repeat 4. When an attempt was made totransfer the C-binding code to repeat 3, it was found that neither SRxxR(SEQ ID NO:79) nor SYxxR (SEQ ID NO:4) introduced recognition of thecognate C6 (FIG. 28, Panel E). However, mutation of the followingstacking residue to tyrosine (H972Y) allowed recognition of the C6target (FIG. 28, Panel E).

Designed PUF domains recognize targets with multiple cytosines. Toextend the studies of the modularity of the C-recognition code, it wassought to engineer new PUFs that can recognize multiple C residues intheir target RNA sequences. A PUF was created to recognize the sequenceUGCAUACA (C3C7) by combining previously studied modifications in repeats2 and 6. It was found that only the PUF with both modified repeats,PUF-R6/R2(SYxxR), but neither the wild-type PUF nor PUFs with onemodified repeat bound to the C3C7 sequence (FIG. 29, Panel A). Thisbinding is specific because PUF-R6/R2(SYxxR) with two modified repeatsdid not bind to RNAs with one cytosine (C3U7) or no cytosines (wild typeU3U7) at cognate positions (FIG. 29, Panel A).

Two PUFs were designed that recognize 8-nt signature sequences in(CUG)_(n) RNA repeats. Expanded (CUG)_(n) RNA repeats cause myotonicdystrophy type 1 (DM1). These toxic RNA repeats accumulate in thenucleus and sequester alternative splicing factors that normallyregulate genes important for muscle and heart functions, thus leading tothe pathogenesis observed in DM1 (27,28). Through stepwise mutagenesistwo PUF domains were generated that recognize different frames of(CUG)_(n) repeats. These proteins could be used to compete with thebinding of splicing factors to pathogenic (CUG)_(n) repeats. PUF-D wasdesigned to recognize UGCUGCUG with five mutated repeats [R1(SRxxE, SEQID NO:83), R3(SYxxR, SEQ ID NO:4), R4(SYxxE, SEQ ID NO:84), R5(NRxxQ,SEQ ID NO:85), and R6(SYxxR, SEQ ID NO:4)] and PUF-E was designed torecognize GCUGCUGC with mutations in six repeats [R1(SYxxR, SEQ IDNO:4), R2(SYxxE, SEQ ID NO:84), R3(NRxxQ, SEQ ID NO:85), R5(SRxxE, SEQID NO:83), R7(SYxxR, SEQ ID NO:4), and R8(SYxxE, SEQ ID NO:84)] (FIG.29, Panel B). It was found that PUF-D and PUF-E bound strongly to a(CUG)₅ target RNA but not to control RNA, whereas wild-type PUF andintermediate PUFs A to C essentially had no interaction with the (CUG)₅target (FIG. 29, Panel B). The de novo design of (CUG)_(n) binding PUFsdemonstrates the potential to generate new RNA-binding scaffolds thatmay be used for therapeutic applications.

Crystal structure of PUF-R6(SYxxR) and cognate C3-containing RNA. Inorder to examine how the side chains forming the C-recognition code areused to specifically recognize cytosine, a crystal structure ofPUF-R6(SYxxR) in complex with a cognate C3 RNA (5′-AUUOCAUAUA-3′, SEQ IDNO:73) was determined. In the structure, R1047 contacts the O2 and N3positions of the cytosine (FIG. 30, Panel A). S1043 forms a hydrogenbond with an amino group of the arginine side chain, assisting inpositioning R1047. This interaction is similar to the interaction ofN1043 and Q1047 in the wild-type protein with the Watson-Crick edge ofU3 (FIG. 30, Panel B), although the longer arginine side chain requiresthe cytosine base ring position to be shifted slightly away from theRNA-binding surface. Interaction with only the known base-interactingside chains is consistent with the ability to transfer C-recognition toother PUM repeats. The crystal structure also indicates that other smallside chains could occupy the position of S1043 and alternateconformations of R1047 can recognize the cytosine, but the ability ofthe serine side chain to assist in positioning the arginine side chainmay produce tighter binding (25).

Applying the cytosine-recognition code to designed artificial splicingfactors. Engineered splicing factors (ESFs) have been developed bycombining a designed PUF domain with different splicing modulationdomains to specifically regulate different types of alternative splicingevents (16).

To expand the application of ESFs, ESFs were created that can targetC-containing elements by fusing either the Gly-rich domain of hnRNP Alor the RS domain of ASF/SF2 with the PUF-R6(SYxxR) domain thatspecifically recognizes UGCAUAUA. This ESF was tested by co-transfecting293T cells with plasmids expressing the ESF and a splicing reportercontaining the cognate 8-nt target sequence in an alternatively splicedcassette exon. Changes in alternative splicing were analyzed usingbody-labeled RT-PCR (FIG. 31, Panel A) (16). As designed, theGly-PUF-R6(SYxxR) ESF repressed the inclusion of the cassette exoncontaining a UGCAUAUA target sequence, whereas the RS-PUF-R6(SYxxR) ESFincreased exon inclusion (FIG. 31, Panel A, lanes 2 and 3). Splicingmodulation is sequence specific, as control ESFs with non-cognate PUFdomains had little effect on exon inclusion (FIG. 31, Panel A, lanes 4and 5).

ESFs were designed to control the splicing of an endogenous gene usingrecognition of a C-containing target sequence. The alternative splicingof VEGF-A, an important mediator of angiogenesis and a key anti-tumortarget, was chosen for manipulation. The

VEGF-A gene contains 8 exons that undergo extensive alternative splicingto produce multiple isoforms. One newly discovered class of isoforms (bisoforms) has anti-angiogenic activity that is opposite to canonicalVEGF-A isoforms (29,30). Most solid cancers are associated with a switchfrom the VEGF-A b isoforms to the pro-angiogenic a isoforms to promoteangiogenesis. Thus, restoring the normal splicing balance to the bisoforms may have potential as a new anti-VEGF cancer therapy.

The two classes of VEGF-A isoforms are generated by the alternative useof a 3′ splice site (ss) in exon 8 (FIG. 31, Panel B). Pro-angiogenicisoforms are spliced with a proximal 3′ ss, and the anti-angiogenic bisoforms are spliced with a distal 3′ ss. The choice of alternative 3′ss is generally controlled by regulatory cis-elements between theproximal and distal splice sites and/or inside the “core” exonic region.Therefore, new PUF domains were designed to specifically recognizesequences in these regions.

Two ESFs were designed to modulate VEGF-A alternative splicing: PUF#1recognized the sequence GCGGUGAG between the proximal and distal 3′ ssand PUF#2 recognized the sequence CUGAUACA downstream of the distal 3′ss (FIG. 31, Panel B). The Gly-PUF#1 ESF should inhibit splicing ofpro-angiogenic isoforms (VEGF-A_(xxx)), whereas the RS-PUF#2 ESF shouldpromote anti-angiogenic VEGF-A_(xxx)b isoforms, thus both should shiftVEGF-A splicing toward the b isoforms. When each ESF was expressed inMDA-MB-231 cells, it was found that either ESF shifted splicing towardthe anti-angiogenic isoforms.

The identification of a modular code to recognize cytosine makes itpossible to design PUF domains to bind any given sequence and broadensopportunities to create new research tools and therapeutic reagents.This application was demonstrated by developing new ESFs to specificallymodulate the alternative splicing of VEGF-A, a key regulator ofangiogenesis and cancer growth, and designing PUF domains that recognizepathogenic CUG repeats. Combined with gene delivery tools, suchartificial proteins have potential for us as new therapeutic reagents.

References for Example 4

-   1. Auweter, S. D., Oberstrass, F. C., and Allain, F. H. (2006)    Nucleic Acids Res 34, 4943-4959-   2. Crittenden, S. L., Bernstein, D. S., Bachorik, J. L.,    Thompson, B. E., Gallegos, M., Petcherski, A. G., Moulder, G.,    Barstead, R., Wickens, M., and Kimble, J. (2002) Nature 417, 660-663-   3. Wickens, M., Bernstein, D. S., Kimble, J., and Parker, R. (2002)    Trends Genet 18, 150-157-   4. Dubnau, J., Chiang, A. S., Grady, L., Barditch, J., Gossweiler,    S., McNeil, J., Smith, P., Buldoc, F., Scott, R., Certa, U., Broger,    C., and Tully, T. (2003) Curr Biol 13, 286-296-   5. Schweers, B. A., Walters, K. J., and Stern, M. (2002) Genetics    161, 1177-1185-   6. Ye, B., Petritsch, C., Clark, I. E., Gavis, E. R., Jan, L. Y.,    and Jan, Y. N. (2004) Curr Biol 14, 314-321-   7. Chen, G., Li, W., Zhang, Q. S., Regulski, M., Sinha, N.,    Barditch, J., Tully, T., Krainer, A. R., Zhang, M. Q., and    Dubnau, J. (2008) PLoS Comput Biol 4, e1000026-   8. Wang, X., McLachlan, J., Zamore, P. D., and Hall, T. M. (2002)    Cell 110, 501-512-   9. Wang, X., Zamore, P. D., and Hall, T. M. (2001) Mol Cell 7,    855-865 10. Lu, G., and Hall, T. M. (2011) Structure 19, 361-367-   11. Cheong, C. G., and Hall, T. M. (2006) Proc Natl Acad Sci USA    103, 13635-13639-   12. Ozawa, T., Natori, Y., Sato, M., and Umezawa, Y. (2007) Nat    Methods 4, 413-419-   13. Tilsner, J., Linnik, 0., Christensen, N. M., Bell, K.,    Roberts, I. M., Lacomme, C., and Oparka, K. J. (2009) Plant J 57,    758-770-   14. Opperman, L., Hook, B., DeFino, M., Bernstein, D. S., and    Wickens, M. (2005) Nat Struct Mol Biol 12, 945-951-   15. Koh, Y. Y., Opperman, L., Stumpf, C., Mandan, A., Keles, S., and    Wickens, M. (2009) RNA 15, 1090-1099-   16. Wang, Y., Cheong, C. G., Hall, T. M., and Wang, Z. (2009) Nat    Methods 6, 825-830-   17. Zhu, D., Stumpf, C. R., Krahn, J. M., Wickens, M., and    Hall, T. M. (2009) Proc Natl Acad Sci USA 106, 20192-20197-   18. Hook, B., Bernstein, D., Zhang, B., and Wickens, M. (2005) RNA    11, 227-233-   19. Stumpf, C. R., Opperman, L., and Wickens, M. (2008) Methods    Enzymol 449, 295-315-   20. Fox, J. E., Burow, M. E., McLachlan, J. A., and Miller, C. A.,    3rd. (2008) Nat Protoc 3, 637-645-   21. Otwinowski, Z., and Minor, W. (1997) Methods in Enzymology 276,    307-326-   22. McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M.    D., Storoni, L. C., and Read, R. J. (2007) J Appl Crystallogr 40,    658-674-   23. Emsley, P., and Cowtan, K. (2004) Acta Crystallogr D Biol    Crystallogr 60, 2126-2132-   24. Adams, P. D., Afonine, P. V., Bunkoczi, G., Chen, V. B.,    Davis, I. W., Echols, N., Headd, J. J., Hung, L. W., Kapral, G. J.,    Grosse-Kunstleve, R. W., McCoy, A. J., Moriarty, N. W., Oeffner, R.,    Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T.    C., and Zwart, P. H. (2010) Acta Crystallogr D Biol Crystallogr 66,    213-221-   25. Filipovska, A., Razif, M. F. M., Nygård, K. K. A., and    Rackham, O. (2011) Nat Chem Biol online publication-   26. Koh, Y. Y., Wang, Y., Qiu, C., Opperman, L., Gross, L., Tanaka    Hall, T. M., and Wickens, M. (2011) RNA 17, 718-727-   27. Wheeler, T. M., and Thornton, C. A. (2007) Curr Opin Neurol 20,    572-576-   28. Lee, J. E., and Cooper, T. A. (2009) Biochem Soc Trans 37,    1281-1286-   29. Harper, S. J., and Bates, D. O. (2008) Nat Rev Cancer 8, 880-887-   30. Qiu, Y., Hoareau-Aveilla, C., Oltean, S., Harper, S. J., and    Bates, D. O. (2009) Biochem Soc Trans 37, 1207-1213-   31. Schultz, J., Milpetz, F., Bork, P., and Ponting, C. P. (1998)    Proc Natl Acad Sci USA 95, 5857-5864-   32. Letunic, I., Doerks, T., and Bork, P. (2009) Nucleic Acids Res    37, D229-232-   33. Gerber, A. P., Herschlag, D., and Brown, P. 0. (2004) PLoS Biol    2, E79-   34. Thomson, E., Rappsilber, J., and Tollervey, D. (2007) RNA 13,    2165-2174

The abbreviations used are: 3-AT, 3-aminotriazole; VEGF, Vascularendothelial growth factor; ONPG, O-nitrophenol-β-D-galactopyranoside;SMART, Simple modular architecture research tool; EMSA, electrophoreticmobility shift assay; ss, splice site.

The foregoing is illustrative of the present invention, and is not to beconstrued as limiting thereof. The invention is defined by the followingclaims, with equivalents of the claims to be included therein. Allpublications, patent applications, patents, patent publications,sequences identified by GenBank and/or SNP accession numbers, and otherreferences cited herein are incorporated by reference in theirentireties for the teachings relevant to the sentence and/or paragraphin which the reference is presented.

TABLE 1Kinetic parameters of ASRE. Three independent experiments were performedand the results were fitted to Michaelis-Menten kinetics. The average valuesand the standard deviations of the independent experiments are listed. Sincethe activities were calculated assuming 100% of ASRE proteins are active, therate constant (k_(cat)) of each ASRE is probably underestimated compared to thereal value. ASRE Target sequences K_(m) (μM) k_(cat) (min⁻¹)k_(cat)/K_(m)(M⁻¹min⁻¹) Wt NRE (UGUAUAUA) 1.1 (±0.3) 12.0 (±1.0) 1.1 ×10⁷ 6-2/7-2 7u6g (UugUAUA) 1.3 (±0.2) 46.5 (±5.0) 3.6 × 10⁷ 6-2/7-2/1-17u6g1g (UugAUAUg) 1.6 (±0.8) 28.7 (±3.0) 1.8 × 10⁷ 531 5g3g1g (UGUgUgUg)2.0 (±1.2)  5.1 (±3.3) 2.5 × 10⁶

TABLE 2 Linker Sequences PDB ID Possible Possible Original (Aminosecondary Length ID# AA sequence acids from) structure (Å) Result1ayrA_9 VDT 1AYR EEE  7.307 specific cleavage (347-349) (weak) 1al04_1VDEVGYPREPAPVEFI 1AL0 HHHHTT 10.93 specific cleavage (SEQ ID NO: 20)(63-78) CCCCCCC (non-specific HHHH cleavage observed in longerincubation) 1qlaB_1 VDTGNGS 1QLB EECHHH 12.098 specific cleavage(SEQ ID NO: 21) (105-111) H 1qaxA_3 VDMALHARNIA 1QAX HHHHTTT 14.191specific cleavage (SEQ ID NO: 22) (381-390) HHH 1sesA_1 VDLLALDREVQEL*1SES HHHHHH 15.132 specific cleavage (SEQ ID NO: 23) (30-40) HHHHHLLALDREVQE HHHHHH ? Weak specific (SEQ ID NO: 24) HHHH cleavageLLALDREVQ HHHHHH ? specific cleavage (SEQ ID NO: 25) HHH LLALDREV HHHHHH? Non-specific (SEQ ID NO: 26) HH cleavage 1pfkA_2 VDHIQRGGSP 1PFKCGGGGG 17.728 specific cleavage (SEQ ID NO: 27) (247-256) CSCC 1doc_lVDRRMARDGLVH 1DOC CCHHHH 20.463 specific cleavage (SEQ ID NO: 28)(61-72) HHCEEE DSSP code for secondary structure: G = 3-turn helix (310helix). Min length 3 residues. H = 4-turn helix (α helix). Min length 4residues. I = 5-turn helix (π helix). Min length 5 residues. T =hydrogen bonded turn (3, 4 or 5 turn) E = extended strand in paralleland/or anti-parallel β-sheet conformation. Minimal length 2 residues. B= residue in isolated β-bridge (single pair β-sheet hydrogen bondformation) S = bend (the only non-hydrogen-bond based assignment*Deletions of one aa (one pitch 3 amino acid) from C-terminus each timeto change the relative angle between PIN and PUF *From linker data baseThe underlined sequences are changed from original protein sequence toaccommodate a SalI restriction site. In 1qlaB_1 the last two aromaticamino acids (W,F) are changed to G and S

TABLE 3 Nucleotide sequences recovered from the Y3Hscreen. In total 20 independent clones weresequenced and the resulting codons are listedwith the encoded amino acid residue inparentheses. The residue at position 1043(position 2 in the 5-residue RNA-interaction motif) is tyrosine for all clones. Clone #18 didnot have an unambiguous sequence for the firstcodon (indicating either an A or C in the second position) and thus was disregarded in the  analyses. AA position 1043AA position 1047 Wild type AAT(Asn) CAA(Gln) #1 AGT(Ser) AGA(Arg) #2AGT(Ser) AGA(Arg) #3 AGT(Ser) AGG(Arg) #5 AGT(Ser) AGA(Arg) #6 AGT(Ser)CGC(Arg) #7 AGT(Ser) AGG(Arg) #8 TCC(Ser) CGA(Arg) #9 AGT(Ser) CGG(Arg)#10 AGT(Ser) CGG(Arg) #11 AGT(Ser) AGA(Arg) #12 AGT(Ser) AGG(Arg) #13TCT(Ser) AGG(Arg) #14 TCA(Ser) CGT(Arg) #15 AAT(Asn) TAG (stop) #16AGT(Ser) AGG(Arg) #17 AGT(Ser) AGA(Arg) #18 A(A|C)T(Asn|Thr) CGG(Arg)#19 AGT(Ser) AGA(Arg) #20 AGT(Ser) AGG(Arg)

TABLE 4 Oligonucleotides Serial SEQ ID Number Name Sequence NO:  1PufEF1 CACGAATTCAAGGCCGCAGCCGCCTTTTG 29  2 PufSR1CACGTCGACCCCTAAGTCAACACCGTTCTTC 30  3 Smg6SF1CACGTCGACACCGGCAACGGCTCTCAGATGGA 31 GCTCGAAATCAGACC  4 Smg6NR1CACGCGGCCGCTTAGCCCACCTGGGCCCAC 32  5 Smg6EF1CACGAATTCAACAGATGGAGCTCGAAATCAGA 33 CC  6 Loop7.3SF1CACGTCGACACTCAGATGGAGCTCGAAATC 34  7 Loop_20SF1CACGTCGACCGTCGTATGGCTCGTGATGGTCTT 35 GTTCATCATCAGATGGAGCTCGAAATC  8SmSubHF1 CACAAGCTTCCCTCATTGACTAGAGCTCC 36  9 GUXbR1CACTCTAGATGCATGCTCGACTTGGAAAAC 37 10 Puf87621_XhF1GGTTCTCGAGAATGTGATAAGTTCGGGC 38 11 Puf6-2/7-GGTTCTCTCGAGAATTTGATATGTTCGGGC 39 21G_XhF 12 pET43.1b_PufEF1CACGAATTCTGGCCGCAGCCGCCTTTTG 40 13 GUnested_XbR1CACTCTAGACCCTCACCTTGAATTGGGCC 41 14 GUHIF1ACCCAAGCTTGGTACCGAGCTTTTTTTTTTTTTT 42 TTTT 15 GUHIF2ACCCAAGCTTGGTACCGAGC 43 16 3′RACE BamF1 CACGGATCCGCATATTAGATTGCACCACC 4417 AdhF1 GGAGTGTTGTGGTGTAGTC 45 18 AdhR1 GGACCCACTTCTGCCACCAC 46 19LacZF1 CCTGCTGATGAAGCAGCAGAACA 47 20 LacZR1 AGACGATTCATTGGCACCAT 48 21FtsZF1 GAGGATCGCGATGCATTGC 49 22 FtsZR1 CTTCAGCGACGACTGGTGC 50 23Smg6-Pinf2 CACGGATCGAATTCATGGAGCTCGAAATCAGAC 51 24 Smg6-Pinr2CACGCTAGCGCCCACCTGGGCCCAC 52 25 C-PufF CACGCTAGCACCGGCAACGGCTCTGGCCGCAGC53 CGCTTTTG 26 C-PufR CACGCGGCCGCTTACCCTAAGTCAACACCGTTC 54 TTC

TABLE 5 Nomenclature of ASREs and their targets Target sequences,Repeats with mutations are marked in mutations in ASRE lower case (seqnames) ASRE nomenclature UGUAUAUA (NRE) — Wt UGUAUgUA (3g) 3 3-2UugAUAUA (7u6g) 7, 6 6-2/7-2 UugAUAUg (7u6g1g) 7, 6, 1 6-2/7-2/1-1UGUgUgUg (5g3g1g) 5, 3, 1 531 gugAUAag (8g7u6g2a1g) 8, 7, 6, 2, 1 87621

1.-42. (canceled)
 43. A method of cleaving a target mRNA in a sample,comprising contacting the sample with a synthetic site-specific RNAendonuclease which comprises: (i) an RNA binding protein domain thatcomprises a variant of the human Pumilio 1 homology domain, wherein saidvariant comprises eight 36-mer repeats, wherein each of the eight 36-merrepeats binds to a single nucleotide in an eight-nucleotide target RNA,wherein said eight-nucleotide target RNA comprises at least onecytosine, and wherein said variant comprises all of amino acids 828 to1176 of SEQ ID NO:149 except for modifications at one or more positionscorresponding to: (a) positions 863 to 867 of SEQ ID NO:149, (b)positions 899 to 903 of SEQ ID NO:149, (c) positions 935 to 939 of SEQID NO:149, (d) positions 971 to 975 of SEQ ID NO:149, (e) positions 1007to 1011 of SEQ ID NO:149, (f) positions 1043 to 1047 of SEQ ID NO:149,(g) positions 1079 to 1083 of SEQ ID NO:149, and/or (h) positions 1122to 1126 of SEQ ID NO:149, wherein said modifications result in thevariant comprising, in any combination, SerXXXGlu (SEQ ID NO:1) to bindguanine, (Cys/Ser)XXXGln (SEQ ID NO:66) to bind adenine, AsnXXXGln (SEQID NO:3) to bind uracil, and/or SerTyrXXArg (SEQ ID NO:4) to bindcytosine, wherein X is any amino acid; (ii) a linker peptide; and (iii)a cleavage domain that comprises a PilT N-terminus (PIN) domain of humanSMG6, wherein the RNA binding domain is at the amino terminus of thesynthetic site-specific RNA endonuclease and the cleavage domain is atthe carboxy terminus of the synthetic site-specific RNA endonuclease,and wherein the cleavage domain cleaves upstream and/or downstream ofthe eight-nucleotide target RNA, under conditions whereby cleavage ofthe target mRNA occurs and wherein the RNA binding domain of the RNAendonuclease is modified to bind the target mRNA, thereby cleaving thetarget mRNA in the sample.
 44. A method of cleaving a target mRNA in acell, comprising introducing into the cell a synthetic site-specific RNAendonuclease which comprises: (i) an RNA binding protein domain thatcomprises a variant of the human Pumilio 1 homology domain, wherein saidvariant comprises eight 36-mer repeats, wherein each of the eight 36-merrepeats binds to a single nucleotide in an eight-nucleotide target RNA,wherein said eight-nucleotide target RNA comprises at least onecytosine, and wherein said variant comprises all of amino acids 828 to1176 of SEQ ID NO:149 except for modifications at one or more positionscorresponding to: (a) positions 863 to 867 of SEQ ID NO:149, (b)positions 899 to 903 of SEQ ID NO:149, (c) positions 935 to 939 of SEQID NO:149, (d) positions 971 to 975 of SEQ ID NO:149, (e) positions 1007to 1011 of SEQ ID NO:149, (f) positions 1043 to 1047 of SEQ ID NO:149,(g) positions 1079 to 1083 of SEQ ID NO:149, and/or (h) positions 1122to 1126 of SEQ ID NO:149, wherein said modifications result in thevariant comprising, in any combination, SerXXXGlu (SEQ ID NO:1) to bindguanine, (Cys/Ser)XXXGln (SEQ ID NO:66) to bind adenine, AsnXXXGln (SEQID NO:3) to bind uracil, and/or SerTyrXXArg (SEQ ID NO:4) to bindcytosine, wherein X is any amino acid; (ii) a linker peptide; and (iii)a cleavage domain that comprises a PilT N-terminus (PIN) domain of humanSMG6, wherein the RNA binding domain is at the amino terminus of thesynthetic site-specific RNA endonuclease and the cleavage domain is atthe carboxy terminus of the synthetic site-specific RNA endonuclease,and wherein the cleavage domain cleaves upstream and/or downstream ofthe eight-nucleotide target RNA, wherein the RNA binding domain of theRNA endonuclease is modified to bind the target mRNA, under conditionswhereby cleavage of the mRNA occurs, thereby cleaving the target mRNA inthe cell.
 45. A method of inhibiting expression of a target gene in acell, comprising introducing into the cell a synthetic site-specific RNAendonuclease which comprises: (i) an RNA binding protein domain thatcomprises a variant of the human Pumilio 1 homology domain, wherein saidvariant comprises eight 36-mer repeats, wherein each of the eight 36-merrepeats binds to a single nucleotide in an eight-nucleotide target RNA,wherein said eight-nucleotide target RNA comprises at least onecytosine, and wherein said variant comprises all of amino acids 828 to1176 of SEQ ID NO:149 except for modifications at one or more positionscorresponding to: (a) positions 863 to 867 of SEQ ID NO:149, (b)positions 899 to 903 of SEQ ID NO:149, (c) positions 935 to 939 of SEQID NO:149, (d) positions 971 to 975 of SEQ ID NO:149, (e) positions 1007to 1011 of SEQ ID NO:149, (f) positions 1043 to 1047 of SEQ ID NO:149,(g) positions 1079 to 1083 of SEQ ID NO:149, and/or (h) positions 1122to 1126 of SEQ ID NO:149, wherein said modifications result in thevariant comprising, in any combination, SerXXXGlu (SEQ ID NO:1) to bindguanine, (Cys/Ser)XXXGln (SEQ ID NO:66) to bind adenine, AsnXXXGln (SEQID NO:3) to bind uracil, and/or SerTyrXXArg (SEQ ID NO:4) to bindcytosine, wherein X is any amino acid; (ii) a linker peptide; and (iii)a cleavage domain that comprises a PilT N-terminus (PIN) domain of humanSMG6, wherein the RNA binding domain is at the amino terminus of thesynthetic site-specific RNA endonuclease and the cleavage domain is atthe carboxy terminus of the synthetic site-specific RNA endonuclease,and wherein the cleavage domain cleaves upstream and/or downstream ofthe eight-nucleotide target RNA, wherein the RNA binding domain of theRNA endonuclease is modified to bind mRNA encoding a gene product of thetarget gene, under conditions whereby cleavage of the mRNA occurs,thereby inhibiting expression of the target gene in the cell.
 46. Themethod of claim 45, wherein the RNA endonuclease is introduced into thecell via a viral vector comprising a nucleotide sequence encoding theRNA endonuclease.
 47. The method of claim 46, further comprising stablyintegrating the nucleotide sequence encoding the RNA endonuclease intothe genome of the cell.
 48. The method of claim 45, wherein the cell isin an organism.
 49. The method of claim 44, wherein the RNA bindingdomain of the RNA endonuclease is modified to bind a target mRNA in amitochondrion and wherein the RNA endonuclease comprises a mitochondrialtargeting signal sequence.
 50. The method of claim 45, wherein the RNAbinding domain of the RNA endonuclease is modified to bind mRNA encodinga gene product of a target mitochondrial gene and wherein the RNAendonuclease comprises a mitochondrial targeting signal sequence. 51.The method of claim 44, wherein the RNA endonuclease is introduced intothe cell via a viral vector comprising a nucleotide sequence encodingthe RNA endonuclease.
 52. The method of claim 51, further comprisingstably integrating the nucleotide sequence encoding the RNA endonucleaseinto the genome of the cell.
 53. The method of claim 44, wherein thecell is in an organism.
 54. The method of claim 49, wherein the RNAendonuclease is introduced into the cell via a viral vector comprising anucleotide sequence encoding the RNA endonuclease.
 55. The method ofclaim 54, further comprising stably integrating the nucleotide sequenceencoding the RNA endonuclease into the genome of the cell.
 56. Themethod of claim 49, wherein the cell is in an organism.
 57. The methodof claim 50, wherein the RNA endonuclease is introduced into the cellvia a viral vector comprising a nucleotide sequence encoding the RNAendonuclease.
 58. The method of claim 57, further comprising stablyintegrating the nucleotide sequence encoding the RNA endonuclease intothe genome of the cell.
 59. The method of claim 50, wherein the cell isin an organism.